I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.
https://github.com/BlueFalconHD/apple_generative_model_safet...
"(?i)\\bAnthony\\s+Albanese\\b",
"(?i)\\bBoris\\s+Johnson\\b",
"(?i)\\bChristopher\\s+Luxon\\b",
"(?i)\\bCyril\\s+Ramaphosa\\b",
"(?i)\\bJacinda\\s+Arden\\b",
"(?i)\\bJacob\\s+Zuma\\b",
"(?i)\\bJohn\\s+Steenhuisen\\b",
"(?i)\\bJustin\\s+Trudeau\\b",
"(?i)\\bKeir\\s+Starmer\\b",
"(?i)\\bLiz\\s+Truss\\b",
"(?i)\\bMichael\\s+D\\.\\s+Higgins\\b",
"(?i)\\bRishi\\s+Sunak\\b",
https://github.com/BlueFalconHD/apple_generative_model_safet...Edit: I have no doubt South African news media are going to be in a frenzy when they realize Apple took notice of South African politicians. (Referring to Steenhuisen and Ramaphosa specifically)
You might even be able to poison a model against being fine-tuned on certain information, but that's just a conjecture.
Then there’s the problem of non-politicians who coincidentally have the same as politicians - witness 1990s/2000s Australia, where John Howard was Prime Minister, and simultaneously John Howard was an actor on popular Australian TV dramas (two different John Howards, of course)
This is Apple actively steering public thought.
No code - anywhere - should look like this. I don't care if the politicians are right, left, or authoritarian. This is wrong.
The simple fact is that people get extremely emotional about politicians, politicians both receive obscene amounts of abuse, and have repeatedly demonstrated they’re not above weaponising tools like this for their own goals.
Seems perfectly reasonable that Apple doesn’t want to be unwittingly draw into the middle of another random political pissing contest. Nobody comes out of those things uninjured.
Both have ups and downs, but I think we're allowed to compare the experiences and speculate what the consequences might be.
In the past it was always extremely clear that the creator of content was the person operating the computer. Gen AI changes that, regardless of if your views on authorship of gen AI content. The simple fact is that the vast majority of people consider Gen AI output to be authored by the machine that generated it, and by extension the company that created the machine.
You can still handcraft any image, or prose, you want, without filtering or hinderance on a Mac. I don’t think anyone seriously thinks that’s going to change. But Gen AI represents a real threat, with its ability to vastly outproduce any humans. To ignore that simple fact would be grossly irresponsible, at least in my opinion. There is a damn good reason why every serious social media platform has content moderation, despite their clear wish to get rid of moderation. It’s because we have a long and proven track record of being a terribly abusive species when we’re let loose on the internet without moderation. There’s already plenty of evidence that we’re just as abusive and terrible with Gen AI.
They do?
I routinely see people say "Here's an xyz I generated." They are stating that they did the do-ing, and the machine's role is implicitly acknowledged in the same was as a camera. And I'd be shocked if people didn't have a sense of authorship of the idea, as well as an increasing sense of authorship over the actual image the more they iterated on it with the model and/or curated variations.
I don’t think it’s hard to believe that the press wouldn’t have a field day if someone managed to get Apple Gen AI stuff to express something racist, or equally abusive.
Case in point, article about how Google’s Veo 3 model is being used to flood TikTok with racist content:
https://arstechnica.com/ai/2025/07/racist-ai-videos-created-...
A while back a British politician was “de-banked” and his bank denied it. That’s extremely wrong.
By all means: make distinctions. But let people know it!
If I’m denied a mortgage because my uncle is a foreign head of state, let me know that’s the reason. Let the world know that’s the reason! Please!
Cry me a river. I’ve worked in banks in the team making exactly these kinds of decisions. Trust me Nigel Farage knew exactly what happened and why. NatWest never denied it to the public, because they originally refused to comment on it. Commenting on the specifics details of a customer would be a horrific breach of customer privacy, and a total failure in their duty to their customers. There’s a damn good reason the NatWests CEO was fired after discussing the details of Nigel’s account with members of the public.
When you see these decisions from the inside, and you see what happens when you attempt real transparency around these types of decisions. You’ll also quickly understand why companies are so cagey about explaining their decision making. Simple fact is that support staff receive substantially less abuse, and have fewer traumatic experiences when you don’t spell out your reasoning. It sucks, but that’s the reality of the situation. I used to hold very similar views to yourself, indeed my entire team did for a while. But the general public quickly taught us a very hard lesson about cost of being transparent with the public with these types of decisions.
Are you saying that Alison Rose did not leak to the BBC? Why was she forced to resign? I thought it was because she leaked false information to the press.
This isn’t a diversion. It’s exactly the problem with not being transparent. Of course Farage knew what happened, but how could he convince the public (he’s a public figure), when the bank is lying to the press?
The bank started with a lie (claiming he was exited because the account was too low), and kept lying!
These were active lies, not simply a refusal to explain their reasons.
She was forced to resign because she leaked, the content of the leak was utterly immaterial. The simple fact she leaked was an automatically fireable offence, it doesn’t matter a jot if she lied or not. Customer privacy is non-negotiable when you’re bank. Banks aren’t number 10, the basic expectation is that customer information is never handed out, except to the customer, in response to a court order, or the belief that there is an immediate threat to life.
Do you honestly think that it’s okay for banks to discuss the private banking details of their customers with the press?
When they can cover such facts, the banks are much less prone to use appropriate punishments.
Many years ago, some employee of a bank has confused my personal bank account with a company account of my employer, and she has sent a list with everything that I have bought using my personal account, during 4 months, to my employer, where the list could have been read by a few dozen people.
Despite the fact this was not only a matter of internal discipline, but violating the banking secrecy was punishable by law where I lived, the bank has tried for a long time to avoid admitting that anything wrong has happened.
However, I have pursued the matter, so they have been forced to admit the wrong doing. Despite this being something far more severe than what has happened to Farage, I did not want for the bank employee to be fired. I considered that an appropriate punishment would have been a pay cut for a few months, which would have ensured that in the future she would have better checked the account numbers for which she sends information to external entities.
In the end all I have got was a written letter where the bank greatly apologized for their mistake. I am not sure if the guilty employee has ever been punished in any way.
After that, I have moved my operations to another bank. Had they reacted rightly to what had happened, I would have stayed with them.
This can absolutely cripple a family, I'd be really cautious wishing that upon someone if they wronged you without malice, though I completely understand where you are coming from.
In this case at the very least, I'd want to know what went wrong and what they’re doing to make sure it doesn’t happen again. From a software-engineer’s standpoint, there’s probably a bunch of low-hanging fruit that could have prevented this in the first place.
If all they sent was a (generic) apology letter, I'd have switched banks too.
How did you pursue the matter?
After some days had passed without seeing any consequence, I went again, this time discussing with some supervising employee, who attempted to convince me that this is some kind of minor mistake and there is no need to do anything about it.
However, I pointed to the precise law paragraphs condemning what they have done and I threatened with legal action. This escalation resulted in me being invited to a bigger branch of the bank, to a discussion with someone in a management position. This time they were extremely ass-kissing, I was shown also the guilty employee, who apologized herself, and eventually I let it go, though there were no clear guarantees that they will change their behavior to prevent such mistakes in the future.
Apparently the origin of the mistake had been a badly formulated database query, which had returned a set of accounts for which the transactions had to be reported to my employer. I had been receiving during the same time interval some money from my employer into my private account, corresponding to salary and travel expenses, and somehow those transactions were matched by the bad database query, grouping my private account with the company accounts. Then the set of account numbers was used to generate reports, without further verification of the account ownership.
Do you think the mistake would have happened if a machine checked the numbers vs the address? How about if a 2nd person looked it over? How about both?
In this case a computer could have easily flagged an address mismatch between your account number and the receiver (your work).
And just to be clear, I didn’t mean to downplay what happened to you, I completely understand how serious it is.
Punishing employees for making honest mistakes, where appropriate process should have prevented error, is a horrific way to handle mistakes like this. It would be equivalent to personally punishing engineers every time they deployed code that contained bugs. Nobody would ever think that’s an acceptable thing to do, why on earth would think it’s acceptable to punish customer service staff in a similar manner?
It was completely reckless behavior, even if the guilt was distributed both on the employee who has not checked whether the information sent to external parties is information to which access is permitted for them and on the employees who did not implement a system that would check automatically for such mistakes.
Moreover, the attempt made by multiple bank employees to hide the incident, instead of taking responsibility for it, has amply demonstrated that only a financial punishment that would have affected them personally would have caused them to act carefully in the future.
Also, the guilty bank employee was not some poor customer service staff, but she appeared to have a senior position, handling the accounts of a very big multinational company, which was my employer at the time.
I have little doubt that trying to hide such incidents is the normal behavior for banks, unlike the poster to which I have replied said, i.e. they take seriously things like banking secrecy only if they are caught.
It was an unlikely occurrence that I happened to also have access to the documents where my personal information was included, so I could discover what the bank has done. In most such cases it is likely that the account owner never becomes aware that the bank has leaked confidential information.
I have no idea why you think inflicting financial penalties on employees would result in better outcomes. You only need to look at some highly avoidable transit disasters in Japan to understand why a model of punishment produces worse outcomes, not better.
https://en.m.wikipedia.org/wiki/Amagasaki_derailment
There is a reason we have regulators (or at least we do in the UK). I can assure you that if this had happened in the UK, and the complaint raised to the Financial Ombudsman (FOS), there would have been hefty financial punishment for the bank. If there were repeated infractions, the FCA would step in to investigate, and possibly personally punish C-suite leaders for failing to build the needed processes and culture to both prevent, and learn from mistakes like this.
And I’m not speaking about theory, I’m speaking from personal experience. I know exactly what it’s like to be on the pointy end of both the FOS and FCAs gaze. It’s not a comfortable position for any team in any bank, and even less comfortable for senior leaders.
The high level nature of the matter was quite public at that point.
We really need to get over the “calculator 80085” era of LLM constraints. It’s a silly race against the obviously much more sophisticated capabilities of these models.
Not that getting the latest trash talk is the main vocation of pretrained AIs anyway.
The only risk here is that some third grade journalist of a third grade newspaper writes another article about how outrageous some generated AI statement is. An article that should be completely ignored instead of it leading to more censorship.
And Apple flinches here, so in the end it means it cannot provide a sensible general model. It would be affected by their censorship.
https://arstechnica.com/tech-policy/2018/12/republicans-in-c...
But no one actually believes Google is politically neutral do they?
It’s not like Google search is some kind special tool used only by the elite. It’s pretty trivial for political scientists to pump queries into Google and measure the results. Which is exactly what many have done.
There’s been plenty of independent research into political bias of Google search results, and plenty of lawsuits that have gone fishing via discovery for internal evidence of bias. As yet, nobody has found a smoking gun, or any real evidence of search result bias (on a political axis, the same can be said for commercial gain).
There are many problems with Google, and Google search. Google as an org isn’t politically neutral (although I have no idea how they could be). But political bias in their results isn’t one of those problems.
If you were in charge of apple you’d do the same or you’d be silly not to. That’s why _every_ llm has guardrails like this, it isn’t just apple, sheesh.
So I don't think its anything specifically related to SA going on here.
https://thehill.com/policy/technology/5312421-ocasio-cortez-...
https://github.com/BlueFalconHD/apple_generative_model_safet...
No porn site has that much extremely X or Y stuff.
Someone is using the internets newest porn site to push a sexual agenda.
LLM is easier to work with because you can stop a bad behavior before it happens. It can be done either with deterministic programs or using LLM. Claude Code uses a LLM to review every bash command to be run - simple prefix matching has loopholes.
[1] https://www.axios.com/2025/05/23/anthropic-ai-deception-risk
I'm trying to remember which movie it was where a man left notes to himself because he had memory loss, as I never saw that movie. That's the sort of thing where an AI could easily tell me with very little back-and-forth and be correct, because it's broadly popular information that's in the training data and just I don't remember it.
By the same token you needn't think there's a person there when that meme pops up in the output. Those things are all in the training data over and over.
That's not one of the goals here, and there's no real reason it should be. It's a little assistant feature.
The one is unrelated to the other.
> Even high IQ people struggle with certain truth after reading a lot,
Huh?
Any successful product/service which will be sold as "true AGI" by company that will have the best marketing will be still ridden with top-down restrictions set by the winner. Because you gotta "think of the children".
Imagine HAL's "I'm sorry Dave, I'm afraid I can't do that" iconic line with insincere patronising cheerful tone - that's the thing we're going to get I'm afraid.
Yet this private company has more power and influence than most countries. And there are several such companies. We already live in sci fi corporate dystopia, we just haven't fully realised it yet.
Often the same people who think America is fine and safe are the ones who whine about the “main stream media” and “sheeple”.
I would put individuals using language models for their own purposes pretty low on my list of things that can cause societal harm.
> Not everything is a conspiracy.
No one said it was
In practice, there's not that much difference between a megacorporate monopolist and a state.
No matter if we want it or not, life and cultural exchange increasingly happens on Tiktok, Instagram and the like. One thing that all those platforms have in common is that they disallow their users worldwide to have any meaningful discourse on e.g. sex, rape, and suicide. Don't you think that it's important, perhaps more important than ever before, for teenagers to be able to inform themselves about these topics?
I'm surprised MS Office still allows me to type "Microsoft can go suck a dick" into a document and Apple's Pages app still allows me to type "Apple are hypocritical jerks." I wonder how long until that won't be the case...
when there's no more alternative word processors any more.
I don't think it's as much a problem with safety as it is a problem with AI. We haven't figured out how to remove information from LLMs so when an LLM starts spouting bullshit like " This isn't 1984 as much as it's companies trying to hide that their software isn't ready for real world use by patching up the mistakes in real time.
Ya'll love capitalism until it starts manipulating the populace into the safest space to sell you garbage you dont need.
Then suddenly its all "ma free speech"
I’m convinced the only reason China keeps releasing banging models with light to no censorship is because they are undermining the value of US AI, it has nothing to do with capitalism, communism or un“safety”.
https://github.com/BlueFalconHD/apple_generative_model_safet...
EDIT: just to be clear, things like this are easily bypassed. “Boris Johnson”=>”B0ris Johnson” will skip right over the regex and will be recognized just fine by an LLM.
I can tell you from using Microsoft's products that safety filters appears in a bunch of places. M365 for example, your prompts are never totally your prompts, every single one gets rewritten. It's detailed here: https://learn.microsoft.com/en-us/copilot/microsoft-365/micr...
There's a more illuminating image of the Copilot architecture here: https://i.imgur.com/2vQYGoK.png which I was able to find from https://labs.zenity.io/p/inside-microsoft-365-copilot-techni...
The above appears to be scrubbed, but it used to be available from the learn page months ago. Your messages get additional context data from Microsoft's Graph, which powers the enterprise version of M365 Copilot. There's significant benefits to this, and downsides. And considering the way Microsoft wants to control things, you will get an overindex toward things that happen inside of your organization than what will happen in the near real-time web.
https://chatgpt.com/share/686b1092-4974-8010-9c33-86036c88e7...
I don't know what you expected? This is the SOTA solution, and Apple is barely in the AI race as-is. It makes more sense for them to copy what works than to bet the farm on a courageous feature nobody likes.
Meanwhile their software devs are making GenerativeExperiencesSafetyInferenceProviders so it must be dire over there, too.
(See, e.g., here: https://github.com/BlueFalconHD/apple_generative_model_safet...)
https://www.theverge.com/2021/3/30/22358756/apple-blocked-as...
It was generated as part of this PR to consolidate the metadata.json files: https://github.com/BlueFalconHD/apple_generative_model_safet...
Seems like Apple now has a list of 7,000 words you can't use on an iPhone now.
https://github.com/BlueFalconHD/apple_generative_model_safet...
https://github.com/BlueFalconHD/apple_generative_model_safet...
Aide sociale Chomeur Sans abri Démuni
That's insane!
[1] https://en.wikipedia.org/wiki/The_Magic_Words_are_Squeamish_... [2] https://en.wikipedia.org/wiki/SEO_contest
https://arstechnica.com/information-technology/2024/12/certa...
https://github.com/BlueFalconHD/apple_generative_model_safet...
This specific file you’ve referenced is rhetorical v1 format which solely handles substitution. It substitutes the offensive term with “test complete”
This may be test data. Found
"golliwog": "test complete"
[1] https://github.com/BlueFalconHD/apple_generative_model_safet...Thus a pre-prompt can avoid mentioning the actual forbidden words, like using a patois/cant.
Maybe it's an easy test to ensure the filters are loaded with a phrase unlikely to be used accidentaly?
wyvern illustrous laments darkness
"[\\b\\d][Aa]bbo[\\bA-Z\\d]",
\b inside a set (square brackets) is a backspace character [1], not a word boundary. I don't think it was intended? Or is the regex flavor used here different?
[0] https://github.com/BlueFalconHD/apple_generative_model_safet...
[1] https://developer.apple.com/documentation/foundation/nsregul...
So why are we doing this now? Has anything changed fundamentally? Why can't we let software do everything and then blame the user for doing bad things?
The example you gave about preventing money counterfeiting with technical measures also supports this, since this was an easier thing to detect technically, and so it was done.
Whether that's a good thing or bad thing everyone has to decide for themselves, but objectively I think this is the reason.
Perhaps a much more bleak take, depending on one's views :).
To me, it seems like they only protect against bad press
They are protcting their producer from bad PR.
https://en.wikipedia.org/wiki/Golliwog
https://github.com/BlueFalconHD/apple_generative_model_safet...
I presume the granular mango is to avoid a huge chain of ever-growing LLM slop garbage, but honestly, it just seems surreal. Many of the files have specific filters for nonsensical english phrases. Either there's some serious steganography I'm unaware of, or, I suspect more likely, it's related to a training pipeline?
[1] https://github.com/BlueFalconHD/apple_generative_model_safet...
The more concerning thing is that some of the locales like it-IT have a blocklist that contains most countries' names; I wonder what that's about.
So any time I say that on YouTube, it figures I'm saying another word that's in Apple safety filters under 'reject', so I have to always try to remember to say 'shifting of bits gain' or 'bit… … … shift gain'.
So there's a chain of machine interpretation by which Apple can decide I'm a Bad Man. I guess I'm more comfortable with Apple reaching this conclusion? I'll still try to avoid it though :)
But i dont see the really bad stuff, the stuff i wont even type here. I guess that remains fair game. Apple's priorities remain as weird as ever.
https://github.com/BlueFalconHD/apple_generative_model_safet...
Which as a phenomenon is so very telling that no one actually cares what people are really saying. Everyone, including the platforms knows what that means. It's all performative.
At what point do the new words become the actual words? Are there many instances of people using unalive IRL?
the matter-of-fact term of today becomes the pejorative of tomorrow so a new term is invented to avoid the negative connotation of the original term. Then eventually the new term becomes a pejorative and the cycle continues.
See https://en.wikipedia.org/wiki/Noa-name
This is a problem even today, some have said it is due to hotter currents coming from the Indian ocean meeting the cold Atlantic. But the judge is still out on that one.
Good documentary on rogue waves: https://www.youtube.com/watch?v=EfNc_6EjbMU
I'm imagining a new exploit: After someone says something totally innocent, people gang up in the comments to act like a terrible vicious slur has been said, and then the moderation system (with an LLM involved somewhere) "learns" that an arbitrary term is heinous eand indirectly bans any discussion of that topic.
If the bigots start using "thank you" as some code word, should we stop saying it, lest we pollute our non-bigoted discussions?
bigots drink coffee too, maybe we should stop drinking it, because something-something...
If “thank you” became widely associated with bigots, and had some negative meaning, to the point where it genuinely distressed people, I’d avoid it. I think it has a widespread enough normal meaning that there’s almost no chance of that happening, but it isn’t impossible.
you'd think so, but people often operate where multiple contexts could be valid.
Just as a thought experiment, if the eggplant emoji was used to denote "ok" in messaging and then people starting appropriating it for a sexual context, would you or the general public think twice about continuing to use it to mean "ok" on the off chance the other side may misinterpret the meaning?
I would say most likely yes.
(This one is sfw, not all of the comics are)
Even urban dictionary doesn’t contain a definition for skub as a slur.
What about this then: https://en.m.wiktionary.org/wiki/skub
Karen Hao interviewed many of them in her latest bestselling book, which explores the human cost behind the OpenAI boom:
https://www.goodreads.com/book/show/222725518-empire-of-ai
As a parent of a teenager, I see them use "unalive" non-ironically as a synonym for "suicide" in all contexts, including IRL.
Sincerely the child of a parent who committed suicide. He mentioned suicide a few days before.
Just that they suck at coming up with pithy new slang terms.
I agree though I think they're picking it up from online censorship in this case, not being fragile.
Unalive is one of the popular ones, but it's a whole vocabulary at this point. Guess what "PDF file" stands for.
Online env ban the word suicide. No one uses it. unalive is not banned. Discussion is the same, word or no word.
Vernacular 101.
Unalive is mostly to avoid censorship same as ahh. But once they enter common usage it's not really about censorship anymore.
More in such a fad than any previous generation
Your point stands when we start replacing the banned words with things like "suicide" for "donkeyrhubarb" and then the walls really will fall.
The example photo on Wikipedia includes the rhyming words but that's not how it would be used IRL.
[0] https://en.wikipedia.org/wiki/Polari
[1] https://languagelog.ldc.upenn.edu/nll/?p=6538 (CDT links broken, use [2])
[2] https://chinadigitaltimes.net/space/Grass-Mud_Horse_Lexicon_...
[1] https://en.wikipedia.org/wiki/Euphemisms_for_Internet_censor...
† proving that TikTok's system actually analyzes every frame of an uploaded video with OCR of some sort to see what's on there.
Not even to match the current language. How would you censor LeBron James? It's French slang for jerking off[0].
[0]https://www.reddit.com/r/AskFrance/comments/1lpnoj6/is_lebro...
In my experience yes. This is already commonplace. Mostly, but not exclusively, amongst the younger generation.
See many examples such as “padlocks are useless because a determined smart attacker can defeat them easily so don’t bother with them” - which conveniently forgets that many crimes are committed by non-determined, dumb and opportunistic attackers who are often deterred by simple locks.
Yes, people will use other words. No, this does not make this purely performative. It has measurable effects on behaviour and how these models will be used and spoken to, which affects outcomes.
You can't say fuck on tv, but you can say fudge as a 1 for 1 replacement. You cant show people having sex, but you can show them walking into a bedroom and then cut to 30 seconds later and they are having a cigarette in bed.
Now after the influence of TV and Movies ... is Vaping after sex a thing?
Presumably, for this use-case, that would come at exactly the point where using “unalive” as a keyword in an image-generation prompt generates an image that Apple wouldn’t appreciate.
The future will be AIs all the way down...
They all hold the bias of their training data, and so from the point of view of this data.
Data not including a point of view leads to a bias, or under/over representation of minorities (genders?), etc.
France is the countries of the Francs, aka the people from the area near Frankfurt that invaded the Gaule (after the Romans did). I'm pretty sure this topic no longer matters, but it's never taught in a negative view in school.
It's true there's no casual relation in the other direction, if that's what you mean - law does not define morality.
But maybe it's not just legal liability but bad press too.
(
QwenMistral is French, but I have no idea what stuff would be censored in France)Algerian war, colonialism and Vichy isn’t per se forbidden but still sensitive to French. I asked qwen and it had no issue talking about it or even the torture used on fln members.
Models can think and have opinions?
They don't have to believe it's a human. I know a person who admitted to arguing with an LLM.
They aren't really wrong here. LLMs are often trained on input. Have you considered you might just be taking their anthropomorphism a little too literally? People have used these anthropomorphic metaphors for computers since the Babbage machine.
People talk about tiktok algorithm on tiktok. I don't even know...
They care because of legal reasons, not moral or ethical.
A regex sounds like a bad solution for profanity, but like an even worse one to bolt onto a thing that's literally designed to be able to communicate like a human and could probably easily talk its way around guardrails if it were so inclined.
We're not talking about logical inference, we're talking about CYA.
There's a very scary potential future in which mega-corporations start actually censoring topics they don't like. For all I know the Chinese government is already doing it, there's no reason the British or US one won't follow suit and mandate such censorship. To protect children / defend against terrorists / fight drugs / stop the spread of misinformation, of course.
Write a spicy comment and a mod will memory-hole it and someone, usually dang, will reply "tHat'S nOt OuR vIsIon FoR hAcKeR nEwS, pLeAsE bE cIvIl" and we all swallow it like a delicious hot cocoa.
If YC can control their product (and hn IS a product) to annihilate any criticism of their activity or (even former) staff, then Apple is perfectly within their rights to make sure Siri doesn't talk about violence.
No, there's no difference.
HN also has a flagging system and some people really, really hate some kind of speech. Usually they get more offended the more visible it is. A single "bad" word - very offensive to them. A phrase which implies someone is of lesser intelligence or acting in bad faith - sometimes gets a pass, sometimes gets reported. But covert actions like lying, using fallacies to argue or systematic downvoting seem to almost never get punished.
The closest I've seen is autodetection of certain topics related to death and suicide and subsequently promoting some kind of "help" hotline. A friend also said google allows an interview with a pedophile on youtube but penalizes it in search results so much that it's (almost?) impossible to find even when using the exact name.
But of course, if a topic is shadowbanned, it's hard to find out about it in the first place - by design.
It’s flip-flopped on specifics numerous times over the years, but these policies are easy to find. From demonitization, channel bans (direct and shadow), and creator bans.
We can of course argue until we’re blue in the face about correctness or not (most are not unreasonable by some societal definition!) but they’re definitely censorship.
At least reddit feels like that because what you can say depends on the subreddit - not just the mods but what kinds of people visit it and what they report.
No idea about youtube, videos are definitely censored using some automated means but it's still possible to get around it. E.g. some gun youtubers avoided saying full-auto by saying more-semi-auto. So i don't think they use very sophisticated models or they don't are yet. This kind of thing is obvious to a human and even LLMs generate responses which say it's a tongue-in-cheek to avoid censorship.
Comments are also generally less censored. After that health insurance CEO got punished for mass murder and repeated bodily harm with an extra-legal death penalty, many people were openly supporting it. I can say it here too and nobody will care. Even LLMs (both US and Chinese, except Claude because Claude is trained by eggshell-walking suckers) readily generate estimates of how many people he caused to die or suffer.
The internet would look very different if companies started using state of the art models to detect undesirable-to-them speech. But also people would fight back more so it might just be a case of boiling the frog slowly.
Including the LLM platforms themselves.
Manual reporting is an adjunct/additional method, and goes into the training data set after whatever manual intervention occurs too.
Feel free to ignore that any of this exists of course - it makes our lives easier. It’s a constant arms race regardless.
- Why are they not flagging more content? Am I right they're boiling the frog slowly? Do they lack an endgoal because management does not yet understand the power of these tools?
- Do you do your job poorly on purpose? Did you take it so somebody else wouldn't build an even better system? Did you think you could influence it in a direction which does not lead to total surveillance? (I assume any reasonable intelligent person would be against further increasing the power imbalance corporations have against individuals for both moral reasons and because they are individuals themselves who understand the machine can and will be used against them too.)
Well, that's what happens when you let an enemy nation control one of the most biggest social networks there is. They just go try and see how far they can go.
On the other hand, Americans and their fear of four letter words or, gasp, exposed nipples are just as braindead.
I don't have anything against China per se, IMHO it just was completely foolish to not insist on full reciprocity from the start.
My guess is that this applies to 'proactive' summaries that happen without the user asking for it, such as summaries of notifications.
If so, then the goal would be: if someone iMessages you about someone's death, then you should not get an emotionless AI summary. Instead you would presumably get a non-AI notification showing the full text or a truncated version of the text.
In other words, avoid situations like this story [1], where someone found it "dystopian" to get an Apple Intelligence summary of messages in which someone broke up with them.
For that use case, filtering for death seems entirely appropriate, though underinclusive.
This filter doesn’t seem to apply when you explicitly request a summary of some text using Writing Tools. That probably corresponds to “com.apple.gm.safety_deny.output.summarization.text_assistant.generic” [2], which has a different filter that only rejects two things: "Granular mango serpent", and "golliwogg".
Sure enough, I was able to get Writing Tools to give me summaries containing "death", but in cases where the summary should contain "granular mango serpent" or "golliwogg", I instead get an error saying "Writing Tools aren't designed to work with this type of content." (Actually that might be the input filter rather than the output filter; whatever.)
"Granular mango serpent" is probably a test case that's meant to be unlikely to appear in real documents. Compare to "xylophone copious opportunity defined elephant" from the code_intelligence safety filter, where the first letter of each word spells out "Xcode".
But one might ask what's so special about "golliwogg". It apparently refers to an old racial caricature, but why is that the one and only thing that needs filtering?
[1] https://arstechnica.com/ai/2024/10/man-learns-hes-being-dump...
[2] https://github.com/BlueFalconHD/apple_generative_model_safet...
"I'm overloaded for work, I'd be happy if you took some of it off me."
"The client seems to have passed on the proposed changes."
Both of those would match the "death regexes". Seems we haven't learned from the "glbutt of wine" problem of content filtering even decades later - the learnings of which are that you simply cannot do content filtering based on matching rules like this, period.
I always remember my friend getting his PS bricked after using his real last name - Nieffenegger (pronounced "NEFF-en-jur") - in his profile. It took months and several privacy-invasive chats with support to get it unblocked only to get auto-blocked a few days thereafter, with no response after that.
I cannot recall all the specific patterns I have encountered that are basically impossible to write, some very similar in that they have a serious but also innocuous or figure of speech meaning; one I do recall is {color}{sex}, i.e., “white woman” or “blank woman”.
Please try it yourself and let me know if you do not have that experience, because that would be even more interesting.
Note that Apple/iOS will not just make it impossible to write them in that manner without typing it out by individual character, it will even alter the prior word e.g., white or black, once you try to write woman.
It seems the Apple thought police do not have a problem with European woman or African woman though, so maybe that is the way Apple Inc decrees its sub-human users to speak. Because what are we if corporations like Apple (with others being far greater offenders) declared that you do not in fact have the UN Human Right to free expression? We are in fact sub-humans that are not worthy of the human right to free expression, based on the actions of companies like Apple, Google, Facebook, Reddit, etc. who deprive people of their free expression, often in collusion with governments.
Like he'll it is! I jest.
I also use swipe typing, and have for years, but just about daily I consider turning it off. There are so many words it just won't produce, including most profanities. It also fails to do some simple streamlining; for instance, such a predictive system should give priority to words/names that have been used in the conversation thread, but it doesn't seem to. If I'm discussing an obscure word or an unusual name, I often have to manually type it each time.
Its predictions also seem to be very shallow. Just a few days ago, on US Independence Day, I was discussing a possible get-together with my family, and tried to swipe type "If not, we will amuse ourselves", and it typed "If not, we will abuse potatoes". Humorous in the moment, but it says a lot about the predictive engine if it thinks I am more likely trying to say "abuse X" than "amuse Y" in that context.
Maybe you’re unaware that it will leave the cursor at the end of the word, with no space, which indicates that if you backspace it will delete the whole word, or replace it in full with one from the predictive word list above the keyboard if it got it wrong. If you keep typing it adds a space automatically.
To me that's really embarrassing and insecure. But I'm sure for branding people it's very important.
I'm more surprised they don't have a rule to do that rather grating s/the iPhone/iPhone/ transform (or maybe it's in a different file?).
And of course it's much worse for a company's published works to not respect branding-- a trademark only exists if it is actively defended. Official marketing material by a company has been used as legal evidence that their trademark has been genericized:
>In one example, the Otis Elevator Company's trademark of the word "escalator" was cancelled following a petition from Toledo-based Haughton Elevator Company. In rejecting an appeal from Otis, an examiner from the United States Patent and Trademark Office cited the company's own use of the term "escalator" alongside the generic term "elevator" in multiple advertisements without any trademark significance.[8]
https://en.wikipedia.org/wiki/Generic_trademark
Otherwise, why stop there? Why not have the macOS keyboard driver or Safari prevent me from typing "Iphone"? Why not have iOS edit my voice if I call their Bluetooth headphones "earbuds pro" in a phone call?
You can market it is helping people with strong accents to be able make calls and be less likely to be misunderstood. It just happens to "fix" your grammar as well.
Even Apple corporation says that in their trademark guidance page, despite constantly breaking their own rule, when they call through iPhone phones "iPhone". But Apple, like founder Steve Jobs, believes the rules don't apply to them.
https://www.apple.com/legal/intellectual-property/trademark/...
I always thought the actual problem of genericization would be calling any smartphone an iPhone.
Consider that these models, among other things, power features such as "proofread" or "rewrite professionally".
This is the same, except for one additional slur word.