10000 lines of logs, rookie number. I was once given 400000 lines of customer data told to find a pattern of discrepancy based on logs. Both files were 400000 lines. Python cannot be in my company due to security reasons as they were financial data, I used java for regex. Edited: loc from 1000 -> 10000
Wait Ive never heard about python not being used due to security concerns, could you expand?
I suppose it could be dependency injection and the greater potential for breaking out of restricted environments.
Also, it's an interpreted language which is a bit less safe than a straight compile.
Also also, python is what they use for the most common hacking tools. Has good potential for privilege escalation.
What does dependency injection have to do with this?
Probably meant something more along the lines of a supply chain attack. A malicious actor putting bad code into a commonly used library or the dependency of a common library, etc. Happened on NPM not too long ago. Someone took over ownership of a library then snuck code in. It was obviously caught but that's not always a guarantee before it does damage. We put a lot of trust in pypi being safe. The better way to avoid this is to host an internal pypi mirror and only approve libraries that pass analysis or just ban use of non-core python modules but some companies go ham-fisted instead I guess.
Probably more "hasn't been approved" than "has been banned".
That makes more sense lol
They had restrictions, plus i only had like 4 months of experience in Java , I was a fresher and I was crying 🥲.
Bravo, you pulled it off beautifully.
Sounds like it might be a fintech company, in which case, do not expect there to be a logical, modern, coherent reason.
I consulted for 14 years and will never do fintech again unless it’s a scrappy consumer-focused org with a low headcount. One company, to work on their iOS code, I had to remote from a perfectly good Mac to a windows machine in the cloud to another Mac. In New Zealand.
Guess no Python interpreter made it into the corporate whitelist?
It's a lot of work to make Python function in a whitelist security policy environment. Approving PyCharm is one thing, but you'd have to maintain an internal PyPI mirror with individually approved packages, and that's where an understaffed corporate infosec department would likely nope out.
Wonder if PyPI-whitelisting-as-a-service could be a viable business model.
How does python impose a security risk?
Don't try to reason with corporate
Every python function call you make is sent to a private server where Roko’s Basilisk reads and learns. Why did you think the language is called Python?
Maybe it was not validated in that environment and thus they could not know if it imposed a security risk or not?
But the mere presence of a programming language be deemed as a security risk is what’s interesting to me. If Python is said to be a risk then why not Java?
They’re aaaall a security risk, honestly. Nothing unique about python. Unless maybe the fact that anti-virus programs can’t really analyze code as well as they can a compiled executable.
That's what I was told, I was not allowed to use python.
Supply chain attacks can and do happen regularly against python's pypi which is why management would restrict the use of it.
It says 10000 though not 1000
My bad, didn't notice 😅
Well if you have an idea of what you are looking for or at least when you are looking for, no problemo.
Otherwise, just take the day off and tell them you found nothing.
EOD stands for ‘End of Dignity’ when dealing with 10000 lines of code
I don’t have imposter syndrome. These posts are made by imposters.
Damn the low quality effort is getting worse
your grep game is weak
10000 lines of log files is easy mode. They probably have a reasonable text encoding, line breaks and everything..
SRE here. My applications log in PROD millions lines per hour and we keep them for 6 weeks. Not that hard to analyse if you use the right tools. IMHO this is a skill issue.
10 M well structured lines can be easier than 10 k ad hoc lines.
A friend developed a "language" for highlighting in Notepad++, so he could collapse the stacks in the logs. After that, he scrolled through the logs via the preview and looked, if he could see any usual pattern, like longer lines or shorter ones.
Humans are……
Depends what you mean by analysis. Really, whatever you're doing it shouldn't matter if it's 10k lines or 10 million lines, you just filter out the noise and either find the exact logs you're looking for, or write a script to extract the data your boss wants.
So this happened to me. And while my manager was showing logs to my junior and me, asked us to analyse the logs and find the problem by EOD.
I was losing my shit like how can you expect us to find it in less than 5 hours. And he was saying bs like you can do it. You got to believe in yourself.
I saw the issue, i found the bug. And I asked him to stop.
And he with pride said this is why I come to you.
I knew I had done myself dirty.
Your junior asked you to have it done by EOD?
A missing comma can cause disasters. Thank you.
Stick them into log insights in cloud watch -> find patterns -> check weird patterns
Make your machine analyze it for you dummy. If you don’t spend 4 hours automating a 10 minute task, can you even call yourself a software engineer?
Or are we dancers?
grep | awk
And do some magic
can’t believe grep is so far down this thread.
Laughs maniacally in regex
ChatGPT or other IA
Better hope there’s nothing confidential in those logs huh
Businesses can get enterprise accounts so they can use these tools without their data becoming future training data.
Exactly. It has saved me so much time
Small log files (<100,000 lines) I just search through in VS Code, but for any large ones I strongly recommend https://github.com/variar/klogg That will open a multi-gigabyte text file with no problem.
I'll just feed it to LLM hahahaha
Trace32.exe -> look for red -> LGTM
Depends on the logs. I was at the customer's site and hat to analyze 600k wireshark packets. Reproduced and found the error in a few minutes. Filtering is the key.
Ctrl c Ctrl t c h a t g p t . c o m enter Ctrl v enter
Microsoft logparser if you want a quick way to use SQL queries against CSV, XML files
upload to copilot/chatgpt ask to generate a RCA report, and email. /s
My Netflix translation said - "We are not horses".
Tell me you don't know how to use regex without telling me you don't know how to use regex.
10000 lines isn't a terrific amount. Humans are good at pattern matching. If you can't see the issues skimming through them at speed (a lot will depend on how familiar you are with the logs), it's time to break out grep
.
I used to be able to track down most errors within about 15-20 minutes, in logs that gzip compressed down to the tens of gigabytes, just leveraging less and grep. The process for me tends to go:
cat file | grep "error" | less
Then look for error level logs (adjusting that grep string to match the format of the logs). If there are too many, and/or lots of them are irrelevant, filter them out.
cat file | grep "error" | grep -v "<string matching what I don't care about>" | less
Rinse and repeat, filtering more and more lines, through successive greps (or using regex OR syntax), until I find what I want. If I don't find what I want, go back to basics, start looking at the full logs, and grepping out irrelevant lines. It's amazing how quickly you'll be able to cut out irrelevant information from the logs just by filtering out what you can quickly identify as good.
One final important thing: Once you've found what you think is the error, go back to the raw logs, find the line in there, and look around for context. It's amazing how infrequently people seem to do this, and how often I've found the real problems that way.
Analyze before end of decade? You've got plenty of time
Logs are meant to be human readable though. It shouldn't matter how much there is. You're looking for an event or a time frame usually. Its easy to find with control + f.
One of the perfect use cases for AI but this sub will just call me a vibe coder
Grep
using AI to analyze data is very different from using it to generate code
I lost a position that was legit 80% reading through logs and trying to figure out what went wrong. It was awful
[deleted]
The position
Type"Error"
No results found
Send a message back "Looks good to me
It isn't that complicated
I couldn't find the Ctrl key
Presses C + T + R + L instead
"Why doesn't this work?"
I did that in high school (to be fair it was my 7th class for the day and I was on auto pilot) I had to press Ctrl+F5, but I pressed Ctrl+F+5, the teacher even said people like me usually fail the class. I got an A and work in programming just to show her.
Your teacher was just an ass. It’s a funny mistake; happens to the best of us.
Lol
That's an actually good app idea, if a bit demented
CMD/CTRL
You caould search for a "Warning" too if you feeling spicy.
Nah, it is the reason why there are 10000 lines in the first place.
🤣🤣 +1 plus if the log are bad, dev are bad
Error ERROR Exception Fatal Fuck
Sometime I search “failed”