For context:
In ASCII text CR and LF are commands to tell a machine that the text is at the end of a line.
CR (Carraige Return) tells a machine to move the text cursor to the beginning of the line.
LF (Line Feed) tells a machine to move the cursor down to the next line.
On Windows machines, they cannot read/write text files properly without the CR, programs like Perforce will convert lone-LFs in a text file to CRLF, and ignore the difference when comparing files.
This means that if you have a binary file that's mistaken for a text file (containing LFs in its data) and the same file with line-end conversions (so it contains CRLFs instead), Perforce will tell you there's no difference between the two files, when a hex editor will tell you that there are a few extra bytes difference.
That extra byte difference caused a game I'm working on to crash, but only on machines with a fresh install and not my dev environment.
It took me nearly a week of struggling before finally comparing the files in a Hex Editor.
God I hate Perforce...
I recall there was also a compatibility bug in windows back in the day which added carriage returns to line feeds.
In what contexts? Could it just a matter of the program splitting the text into lines in memory and assuming CRLF when reconstructing the file?
I think I saw it in a video about decompiling LEGO Island or something similar. I'll have to fact check.
I know MattKC specifically talked about this in his Putting a game in a QR code video. The software he used interpreted the binary as text and inserted CR after every LF byte
Ah. So that's why I remembered something like that. Sorry for misinformation and thanks for correcting me.
Not a bug. The difference between opening a stream in text mode vs binary mode.
Won't Git do the same depending on config? There were some Windows line feed related switches in Git.
I don't have Windows, and don't care, so I don't know for sure what they are good for.
Ah, here we go:
https://stackoverflow.com/questions/1967370/git-replacing-lf-with-crlf
So this seems common, and expected behavior.
The solutions to such issues is always the same: Just don't use Windows. 😂
I wish git could be the same.
If you access the same repository from windows and linux (e.g. WSL or virtualbox shared folder), you will have troubles because of CRFL/LF issues.
While it is possible to ignore it in git diff
, I have not found a workaround for git commit
.
Do git config --global core.autocrlf true
on your Windows machine.
Replaces LF with CRLF when pulling files, and CRLF with LF when commiting files. The result is files in the repository will only have LF and files in your local repository on Windows will have CRLF.
Which is an annoyance when you have a linter that converts everything to lf. Your git status will get spammed with all the files it changed and when you commit only the actual changed files get committed, others disappear.
I just do lf everywhere
Fellow Perforce user. I have been burned more than once by it. I feel your pain. I feel it deeply and so so much.
Hello froze over when Microsoft add Unix line ending to notepad, in 2018.
This is so annoying — and unnecessary
I forget the details now, but there was a time I worked on some Rube Goldberg of a project where everything was home-rolled including the checksum generation. (No Git, no common CI platform, it was spaghetti made out of corn flakes)
When the checksum job was added, somewhere along the pipe, LF was converted to CRLF or vice versa, and code written by Windows users would bork the checksum generation because it was calculating for a different bytestream.
I don't put that job on my resume.
I dunno man, on every new file git tells me it will convert CRLF to LF and I can still read the files fine (windows 11).
Windows users with stockholm syndrome will still tell you it works fine for programming.
It all comes back to the question of whether "enseno der Lowe\r\s\s\s\s~\s\s\s\s\s\s\s\"" is the right way to encode "enseño der Löwe" when sending it to the printer...or whether you should use backspaces instead of carriage returns.
People did that shit back in the '80s, back when every character was obviously the same width, and obviously nobody cares what's on the computer, it's what's on the resulting piece of paper that matters.
The CR LF sequence originated far further back than the '80s. It's present in the ASCII definition from 1963, which again originates from ITA-2, which is based on Murray code - the first one to introduce CR and LF, and which in turn was based on Baudot Code:
https://en.wikipedia.org/wiki/Baudot_code
So CR LF in a formal definition in a protocol tracks back to at least 1901 from my (quick) research.