Hacker News Post

Discussion

saeedesmaili

After reading this I realized I also have an archive of my pocket account (4200 items), so tried the same prompt with o3, gemini 2.5 pro, and opus 4:

- chatgpt UI didn't allow me to submit the input, saying it's too large. Although it was around 80k tokens, less than o3's 200k context size.

- gemini 2.5 pro: worked fine for personality and interest related parts of the profile, but it failed the age range, job role, location, parental status with incorrect perdictions.

- opus 4: nailed it and did a more impressive job, accurately predicted my base city (amsterdam), age range, relationship status, but didn't include anything about if I'm a parent or not.

Both gemini and opus failed in predicting my role, probably understandably. Although I'm a data scientist, I read a lot about software engineering practices because I like writing software and since I don't have the opportunity at work to do this kind of work, I code for personal projects, so I need to learn a lot about system design, etc. Both models thought I'm a software engineer.

Overall it was a nice experiment. Something I noticed is both models mentioned photography as my main hobby, but if they had access to my youtube watch history, they'd confidently say it's tennis. For topics and interests that we usually watch videos rather than reading articles about, would be interesting to combine the youtube watch history with this pocket archive data (although it would be challenging to get that data).

alexnorton

I was able to give this a try on every YouTube video I've ever watched by exporting the history from Google Takeout:

https://takeout.google.com/settings/takeout/custom/youtube?p...

And then a combination of pup and jq to parse the video titles from the HTML file:

  cat watch-history.html \
    | pup '.outer-cell .mdl-grid .content-cell:nth-child(2) json{}' \
    | jq -r '.[] .children[0] | select(.tag != "br") | select(.text | startswith("https://www.youtube.com/watch?v=") | not) | .text' \
    > videos.txt

juliendorra

You should be able to use Google Takeout to get all of your YouTube data, including your watch history.

This article is a nice example of someone using it:

> When I downloaded all my YouTube data, I’ve noticed an interesting file included. That file was named watch-history and it contained a list of all the videos I’ve ever watched.

https://blog.viktomas.com/posts/youtube-usage/

Of course as an European it's a legal obligation for companies to give you access, but I think Google Takeout works worldwide?

yubblegum

This can give a false sense of what Google (Alphabet) actually knows about you. That above is Google playing the game of 'ok, here is what we know of your activities on youtube when logged in!'

But Google and the rest of the "advertising" (euphemism for surveillance) industry track and create "profiles" based on a basket of data points, from ip/MAC address to the rest of their bag of tricks.

dietr1ch

Internally at Google a toy tool to peek into your own personal advertisement profile was released and taken down within a week or two because it was creepy knowledgeable about you.

ariwilson

when?

jazzyjackson

Yes I've done this in USA. pretty neat. I have it on my todo list to parse over it and find all the music videos I've watched 3 or more times to archive them.

toomuchtodo

https://archive.zhimingwang.org/blog/2014-11-05-list-youtube... might be of use along with https://github.com/yt-dlp/yt-dlp, might just grab it all and prune later due to rot and availability issues over time within YT.

viraptor

It is available and it can be surprisingly large. I've somehow accumulated multiple GB of data from YT alone. Which feels a bit absurd - there's bound to be lots of waste there.

tehlike

You should take this as a sign, and shoot for SWE jobs - given your interest.

What you do at work today doesn't mean you can't switch to a related ladder.

justusthane

Sometimes it’s nice for hobbies to remain hobbies

cortesoft

I believed this, which is what made me avoid computer science in college; I wanted to avoid ruining my favorite hobby.

After a few years post graduation, where I wasn't sure what I wanted to do and I floundered to find a career, I decided to give software development a try, and risk ruining my favorite hobby.

Definitely the best decision I could have made. Now people pay me a lot of money to do the thing I love to do the most... what's not to love? 20 years later, it I still my favorite hobby, and they keep paying me to do it.

p1necone

I think it heavily depends on who you're working for.

If they get out of the way and let you do the thing you love how you want to do it you'll get good results for you and them.

If they treat you like a cog in a machine and assume they need to carrot and stick you into doing things because you might not really want to be there, you'll be miserable.

cortesoft

I have worked a few places at many different positions over an 18 year career so far.

I have enjoyed the programming part of all the jobs. I don’t really care the problem, I just like using computers to solve problems.

justusthane

Sure, of course. Sometimes it works out to follow your passion into a career. I was objecting to the apparent premise that that’s _always_ what you should do.

8n4vidtmkvmk

My first software job I enjoyed. My 2nd/current job I enjoy everything except the actual work. Too much beuracracy, but it hasn't ruined my love for the craft yet. Oh well, I'm building some other skills I didn't know I had in me.

formerphotoj

Exactly this. The need to make money from a thing may well eliminate the value one derives from the thing, and even add negatives such as stress, etc.

abrookewood

100%. I am absolutely certain that I do not have a viable career as a professional surfer ... no matter how much I wish it wasn't true.

sea-gold

https://english.stackexchange.com/questions/25225/ways-to-ru...

smt88

I love reading about cooking but I'd hate to become a cook

greenavocado

You need to use an iterative refinement pyramid of prompts. Use a cheap model to condense the majority of the raw data in chunks, then increasingly stronger and more expensive models over increasingly larger sets of those chunks until you are able to reach the level of summarization you desire.

larve

re o3: you can zip the file, upload it, and it will use python and grep and the shell to inspect it. I have yet to try using it with a sqlite db, but that's how i do things locally with agents.

saeedesmaili

Author mentions that by doing that they didn't get a high quality response. Adding the texts into model's context make all the information available for it to use.

tgtweak

I think a reasoning/thinking-heavy model would do better at piecing together the various data points than an agentic model. Would be interested to see how o3 does with the context summarized.

saeedesmaili

Agreed, that's why I used reasoning models (gemini 2.5 pro and opus 4 with extended thinking enabled).

datpuz

Reading 80k tokens requires more than 80k tokens due to overhead

LoganDark

> Both models thought I'm a software engineer.

You probably still are, even if that's not your career path :)

UrineSqueegee

o3 on the webui has a tiny context as do all the models

elcapitan

The main thing I learned from my pocket export is that 99% of the articles were "unread". Not sure if it would make sense to extrapolate something about myself other than obsessive link hording from this. :D

gavmor

For many years I've used Pocket to give myself permission to get back to work.

internet_points

Me too! I kind of wish I didn't know it was shutting down, and they just replaced the button with something that saves it to /dev/null without ever telling me.

bryancoxwell

Well, read or not you saved those links for a reason

sandspar

Perhaps comparing your read/unread might tell something about your revealed vs stated preferences. I assume that the typical person's unread pile is mostly aspirational. I'm sure that there's lots of data on this - for example Amazon's recommendation graph may weigh our Wishlist items differently than our Purchased items.

elcapitan

I'm sure if you look long enough, you can find any pattern you want, and the opposite ;)

jackdawed

I've noticed a lot of people are converging on this idea of using AI to analyze your own data, the same way the companies do it to your data and serve you super targeted content.

Recently, I was inspired to do this on my entire browsing history, after reading https://labs.rs/en/browsing-histories/ I also did the same from ChatGPT/Claude conversation history. The most terrifying thing I did was having an LLM look at my Reddit comment history.

The challenges are primarily with having a context window large enough and tracking context from various data sources. One approach I am exploring is using a knowledge graph to keep track of a user's profile. You're able to compress behavioral patterns into queryable structures, though the graph construction itself becomes a computational challenge. Recently most of the AI startups I've worked with have just boiled down to "give an LLM access to a vector DB and knowledge graph constructed from a bunch of text documents". The text docs could be invoices, legal docs, tax docs, daily reports, meeting transcripts, code.

I'm hoping we see an AI personal content recommendation or profiling system pop up. The economic incentives are inverted from big tech's model. Instead of optimizing for engagement and ad revenue, these systems are optimized for user utility. During the RSS reader era, I was exposed to a lot of curated tech and design content and it helped me really develop taste and knowledge in these areas. It also helped me connect with cool, interesting people.

There's an app I like https://www.dimensional.me/ but the MBTI and personality testing approach could be more rigorous. Instead of personality testing, imagine if you could feed a system everything you consume, write, and do on digital devices, and construct a knowledge graph about yourself, constantly updating.

DavidPeiffer

>During the RSS reader era, I was exposed to a lot of curated tech and design content and it helped me really develop taste and knowledge in these areas.

Man, you helped me realize how much the RSS era helped me out. I followed so many different sources of articles and had them roughly prioritized by my interest in them. It was really helpful reading thousands of articles and developing better and better mental models of how technology works while I was in high school. A lot has changed, but many of the mental models are still pretty accurate and handy for branching off and diving in deeper where I'm interested.

nottorp

> Instead of optimizing for engagement and ad revenue, these systems are optimized for user utility.

Are they, or instead they will help keeping you in your comfort cage?

Comfort cage is better than engagement cage ofc, but maybe we should step out of it once in a while.

> During the RSS reader era, I was exposed to a lot of curated tech and design content and it helped me really develop taste and knowledge in these areas.

Curated by humans with which you didn't always agree, right?

janalsncm

> Are they, or instead they will help keeping you in your comfort cage?

I’ve been paying close attention to what YouTube shorts/tiktok do. They don’t just show you the same genre or topic or even set of topics. They are constantly in an explore-exploit pattern. Constantly trying to figure out the next thing that’ll keep your attention, show you a bunch of that content, then on to the next thing. Each interest cluster builds towards a peak then tapers off.

So it’s not like if you see baking videos it’ll keep you in that comfort zone forever.

nottorp

But you're describing the engagement cage, while I'm just pointing that you need to be careful not to escape from it just to get trapped inside the comfort cage.

Karrot_Kream

Is there a rigorous definition of "engagement cage" or is the term just HN-tuned engagement bait? (:

Fundamentally content discovery is always an explore-exploit loop. What you tune the loop with is what makes it useful for any given purpose.

jackdawed

That's the core challenge in designing a system like this. Echo chambers and comfort cages emerge from recommendation algorithms, and before that, from lazy curation.

If you have control over the recommendation system, you could deliberately feed it contrarian and diverse sources. Or you could choose to be very constrained. Back in RSS days, if you were lazy about it, your taste/knowledge was dependent on other people's curation and biases.

Progress happens through trends anyway. Like in 2010s, there was just a lot of Rails content. Same with flat design. It wasn't really group think, it just seemed to happen out of collective focus and necessity. Everyone else was talking/doing this so if you wanted to be a participant, you have to speak the language.

My original principle when I was using Google Reader was I didn't really know enough to have strong opinions on tech or design, so I'll follow people who seem to have strong opinions. Over time I started to understand what was good design, even if it wasn't something I liked. The rate of taste development was also faster for visual design because you could just quickly scan through an image, vs with code/writing you'd have to read it.

I did something interesting with my Last.fm data once. I've been tracking my music since 2009. Instead of getting recommendations based on my preferences, I could generate a list of artists that had no or little overlap with my current library. It was pure exploration vs exploitation music recommendation. The problem was once your tastes get diverse enough, it's hard to avoid overlaps.

fudged71

I’ve been really interested in stuff like this recently. Not just Pocket saves but also meta analysis of ChatGPT/Gemini/Claude chat history.

I’ve been using an ultra-personalized RSS summary script and what I’ve discovered is that the RSS feeds that have the most items that are actually relevant to me are very different from what I actually read casually.

What I’m going to try next is to develop a generative “world model” of things that fit in my interests/relevance. And I can update/research different parts of that world model at different timescales. So “news” to me is actually a change diff of that world model from the news. And it would allow me to always have a local/offline version of my current world model, which should be useful for using local models for filtering/sorting things like my inbox/calendar/messages/tweets/etc!

nsypteras

A while back I made a little script (for fun/curiosity) that would do this for HN profiles. It’d use their submission and comment history to infer a profile including similar stuff like location, political leaning, career, age, sex, etc. Main motivation was seeing some surprising takes in various comment threads and being curious about where it might have came from. Obviously no idea how accurate the profiles were, but it was similarly an interesting experiment in the ability of LLMs to do this sort of thing.

nozzlegear

> Main motivation was seeing some surprising takes in various comment threads and being curious about where it might have came from.

It'd be interesting to run it on yourself, at least, to see how accurate it is.

mywittyname

I remember this. It was pretty accurate for myself, if a little saccharine (i.e., it said I was going to save the world, or some such).

morkalork

Someone recently did this to predict what would hit the HN front page based on article content + profiles of users.

nsypteras

That's pretty cool! Now I can imagine a tool that gives you a prediction before you even post and then offers suggestions for how to increase performance...

tencentshill

And now we see how easy it is to astroturf any given post, and that's without any budget.

morkalork

Gotta hand it to SamA for not only selling the problem but also trying to cash out on the solution (verified human via creepy orb eyeball blockchain thingy)

swyx

link please? if you can find it

morkalork

https://news.ycombinator.com/item?id=44302355

benjaminoakes

Something I've been working on: https://getoffpocket.com

I hope it can help you

HappMacDonald

I'm not on pocket because I still don't know what it is. Just that it's yet another in a string of services my various flavors of web browser have tried pitching to me over the decades, and because it's hosted by a third party and apparently piquantly of interest to somebody else that I use it I tend to pass.

Although I am at least morbidly curious: what is it even?

My best hot take guess is "it's bookmarks, but probably even less useful somehow".

apparent

It's a read later service, which was originally called Read It Later. I guess they should have stuck with the old name, if they wanted increase transparency for newcomers.

It's not less useful than bookmarks, of course. It's more useful because they fetch and save the content, and they present it in a reader-friendly (ad-free) viewer.

asveikau

As someone with a family background of more left leaning Catholics (which I think are more common in the US northeast), it's interesting that it decided that you are conservative based on Catholicism.

CGMthrowaway

I would say in aggregate, both Catholics and Protestants (whichever flavor) are more likely to be liberal in the northeast / west coast and more likely to be conservative in the midwest / south. Which tells you something about the average importance of religion in 2025.

asveikau

I think it's older than 2025 and definitely has a piece of it that is specific to Catholics. I tend to think of northeastern American Catholicism from the lens of immigration. The big waves of Italians, Irish, Eastern Europeans, etc. The immigrant identity often led to left leaning economics and the parts of Christianity which are about helping the poor get emphasized.

CGMthrowaway

Idk how much experience you have with catholics outside of the northeast. I have a fair amount with all of the regions I mentioned (northeast, south, midwest, west coast). You cannot really find any American Catholic parish that is not dominated by at least one of Italians, Irish, Eastern Europeans or Hispanics. The catholic church in the US is mostly "immigrants," that is, people whose ancestors were not in the US prior to ~1850

KoolKat23

i.e. are you a charitable catholic or a prudish catholic.

burnte

Born in Pittsburgh, raised Catholic, pretty darn liberal. We had alter girls in the 90s, openly gay members who had ceremonies in the church, etc. I'm not catholic now but that was a good church in the 80s and 90s.

ycombinete

There’s a quip I heard recently that the most Protestant Christians are American Catholics.

cgriswald

To be fair, it actually said:

> Fiscally conservative / civil-libertarian with traditionalist social leaning

And justified it with:

> Bogleheads & MMM frugality + Catholic/First Things pieces, EFF privacy, skepticism of Big Tech censorship

First Things in its current incarnation is all about religious social conservatism. If someone is Catholic and reads First Things articles, "conservative" is a pretty safe bet.

However, I think profiling people based on what they read might be a mistake in general. I often read things I don't agree with and often seek out things I don't agree with both because I sometimes change my mind and because if I don't change my mind I want to at least know what the arguments actually are. I do wonder, though, if I tended to save such things to pocket.

pyuser583

First things is also pretty high brow. If you’re interested in poetry, classics, etc it had a lot to offer.

I’m sure any profiler would be very confused by my reading history, but I really, really like poetry and Plato. So New Yorker, Atlantic, First Things, N+1.

kixiQu

I have a hypothes.is account where a decent amount of my annotations are little rage nits against the thing I'm reading. You'd be able to infer a ton of correct information from me if you pulled the annotations as well as the URLs, but the URLs alone could mislead.

I've had to remind myself of this pattern with some folks whose bookmarks I follow, because they'd saved some atrocious stuff – but knowing their social media, I know they don't actually believe the theses.

gorgoiler

Interesting article. Bizarrely it makes me wish I’d used Pocket more! Tangentially, with LLMs I’m getting very tired with the standard patter one sees in their responses. You’ll recognize the general format of chatty output:

Platitude! Here’s a bunch of words that a normal human being would say followed by the main thrust of the response that two plus two is four. Here are some more words that plausibly sound human!

I realize that this is of course how it all actually works underneath — LLMs have to waffle their way to the point because of the nature of their training — but is there any hope to being able to post-process out the fluff? I want to distill down to an actual answer inside the inference engine itself, without having to use more language-corpus machinery to do so.

It’s like the age old problem of internet recipes. You want this:

  500g wheat flour
  280ml water
  10g salt
  10g yeast

But what you get is this:

  It was at the age of five, sitting
  on my grandmother’s lap in the
  cool autumn sun on West Virginia
  that I first tasted the perfect loaf…

sram1337

That is an issue with general use LLM apps like ChatGPT - they have to have wide appeal, so if you want replies that are differ from what the average user wants, you're going to have a bad time.

OpenAI has said they are working on making ChatGPT's output more configurable

apsurd

How do you trust the recipe without context?

People say they want one thing but then their actions and money go to another.

I do agree there's unnecessary fluff. But "just give me the recipe" isn't really what people want. And I don't think your represent some outlier take because really have you ever gotten a recipe exactly as you outlined — zero context – and gave a damn to make it?

T0Bi

The biggest cooking / recipe app in Germany (Chefkoch) works perfectly fine for millions of people without any of the fluff. It's a list of ingredients and cooking steps, that's it. I don't know a single person that cooks who doesn't use it regularly.

lan321

> How do you trust the recipe without context?

Ratings or poster reputation.

I often use recipes from a particular chef's website, which are formulated with specific ingredients, steps, and, optionally, a video. I trust the chef since I've yet to try a bad recipe from him.

I also often use baking recipes from King Arthur based on ratings. They're also pretty consistently good and don't have much fluff.

apsurd

Those are good examples. A trusted chef's website can list purely the recipe because it's held within a pre-vetted context. I do this as well.

I'm advocating for the need for those kinds of trust signals. If AI literally just listed ingredients, I wouldn't trust it. How could I?

Brendinooo

> But "just give me the recipe" isn't really what people want.

The structure of recipe sites has less to do with revealed preferences and more to do with playing the SEO game.

marssaxman

> How do you trust the recipe without context?

Well, I just read it. The stakes are not that high!

> have you ever gotten a recipe exactly as you outlined — zero context – and gave a damn to make it?

Of course: there are a great many useful cookbooks written exactly this way.

apsurd

The book is the context! It was published, it has a presumably influential vetted author.

Maybe I am coming off too flippant. I'm just trying to say there's a spectrum between fluff and context. If the AI's literally just gave us answers and list of recipes, it wouldn't be as useful as with the context backing up where it came from, why this list, and so on.

marssaxman

Well, that's a reasonable point!

Perhaps it also depends on one's approach to cooking. I often read recipes not because I intend to follow them, but to understand the range of variation in the dish before I make my own version. "Somebody liked this enough to bother writing it up" is enough context for that use.

aniviacat

Yesterday I baked some muffins from an internet recipe that had a list of ingredients and four sentences on what to do. They're pretty nice.

dicethrowaway1

FWIW, o3 seems to get to the point more quickly than most of the other LLMs. So much so that, if you're asking about a broad topic, it may abbreviate a lot and make it difficult to parse just what it's saying.

mattmanser

I just add "be concise" to the end. Works pretty well.

I'm no expert, but with the "thinking" models, I'd hope the "be concise" step happens at the end. So it can waffle all it wants to itself until it gives me the answer.

airtonix

[dead]

ArturSkowronski

I did something similar when pocket was announcement: https://github.com/ArturSkowronski/moltres-pocket-analyzer

I wanted a tool that clean the data, tag them and bring a way to analyze them easily with a Notebooks and migrate.

I had a lot of "feels" getting through this :)

frou_dh

Another thing one could do with a flat list of hundreds of saved links (if it's being used for "read it later", let's be honest: a dumping ground) is to have AI/NLP classify them all, to make it easy to then delete the stuff you're no longer interested in.

stared

Actually, I am underwhelmed. I mean, a decade ago, with WAY simpler machine learning algorithms (no fancy deep learning, just shallow singular value decomposition and logistic regression, https://www.pnas.org/doi/10.1073/pnas.1218772110), it was possible to predict personality traits from just a few dozen of social media likes. A single like is (nomen omen) likely less valuable than a link saved (as links come from a wider and potentially more diverse data sets).

Does it mean that AI knows more about us that many of our friends? Yes.

scotty79

> Actually, I am underwhelmed.

LLM understood the verbal assignment and gave an answer "from the top of its head" without performing any specialized analysis.

nikisweeting

If you're trying to get your data out of Pocket be aware their export doesn't include your tags, highlights, or the actual saved article content.

If you want everything including the text archives from sites that have gone down, you need to use an external tool like this one I built: https://pocket.archivebox.io

threecheese

There’s no guarantee this didn’t base the results on just 1/3 of the contents of your library though, right? How can it be accurate if it’s not comprehensive, due to the widely noted issues with long context? (distraction, confusion, etc)

This is a gap I see often, and I wonder how people are solving it. I’ve seen strategies like using a “file” tool to keep a checklist of items with looping LLM calls, but haven’t applied anything like this personally.

gavmor

Maybe we need some kind of "node coverage tool" to reassure us that each node or chunk of the embedding context has been attended to.

cainxinth

I do this to determine if a person I'm talking to online is potentially a troll. I copy a big chunk of their comment and post history into an LLM and ask for a profile.

The last few years, I've noticed an uptick in "concern trolls" that pretend to support a group or cause while subtly working to undermine it.

LLMs can't make the ultimate judgement call very well, but they can quickly summarize enough information for me to.

pixl97

One thing I've seen happen with some of these accounts is they remove a lot of their posts after some period of time.

So they make somewhat consistent 'generic' posts that do not get remove, but do not really convey any signal on their actual views.

Then in their last 24-48 hours there are more political style posts/concern posts that only stick around while the article/post is getting views. Then replies disappear like they've never happened so you can't tell it's an account that exists wholly to manipulate others that has been doing so for months.

Then quite often after a month or two the accounts disappear totally.

cainxinth

When I was a kid, internet trolls were just in it for the lulz. Today, it’s a global industry with nation states participating.

sfink

Perhaps they're farming accounts? As in, the owner creates a whole bunch of accounts and has them build up a generic history. Then when the owner "deploys" some of them to pump up a specific issue. I don't know why they remove the posts, but perhaps it's a way of "recycling" an account by cleaning up the dirty work it did and throwing it back into the pool of available accounts?

Come to think of it, I bet the original creator is selling these accounts to someone else who is weaponizing them. Or the creator is renting them: build up a supply, rent them out for a purpose, then scrub them and recycle. Work From Home! Make Money Fast! This is one part of why the internet has gone to hell.

I don't have an explanation for why they'd delete the accounts.

dimitri-vs

I would think you can get pretty accurate results by including the top 10 subreddits they are active in and their last 20 comments (and their score). Comments alone may not be enough, the reaction to them is more telling.

cainxinth

I used to try taking different samples, top versus controversial (for redditors), but now that Gemini offers massive context windows, I just grab a huge swath of everything.

tantalor

Honestly asking:

Did you try it on yourself?

What prompt do you use to avoid bias?

cainxinth

Sure I did. It was fairly accurate. The prompt is just “profile this user.”

goopypoop

I asked my "human brain" to profile you but it threw a megalomania error and now everything's stripy and tinted

marknutter

"Concern troll" is usually just at term that people who want zero pushback lob at people who don't agree with them 100% of the time.

mlekoszek

Explain this a bit. I'm interested, but I don't fully understand how you mean this.

HeatrayEnjoyer

That's not at all been the case in my experience.

goopypoop

[flagged]

hubraumhugo

I built a similar tool that profiles/roasts your HN account: https://hn-wrapped.kadoa.com/

It’s funny and occasionally scary

Edit: be aware, usernames are case sensitive

mh-

> Your comments often feature detailed technical explanations or corrections, leading me to believe you're either a deeply passionate technologist or you just love being the smartest person in the room. Probably both, let's be honest.

Absolutely savage.

This is great/hilarious, thank you.

cluckindan

It would be much funnier and/or insightful if it sampled more than the first page of user comments.

Still, spot on:

Predictions

Personal Projects

After a deep dive into archaic data storage, you'll finally release 'Magnetic Tape Master 3000' – a web-based app that simulates data retrieval from a reel-to-reel, complete with authentic 'whirring' sound effects. It'll be a niche hit with historical computing enthusiasts and anyone who misses the good old days of physical media.

thearn4

I feel seen

> Your profile reads like a 'Hacker News Bingo' card: NASA, PhD, Python, 'Ask HN' about cheating, and a strong opinion on Reddit's community. The only thing missing is a post about your custom ergonomic keyboard made from recycled space shuttle parts.

mywittyname

> The only thing missing is a post about your custom ergonomic keyboard made from recycled space shuttle parts.

You know what must be done.

Avicebron

Predictions:

"You'll discover a hitherto unknown HN upvote black hole, where all your well-reasoned, nuanced comments on economic precarity get sucked into oblivion while a 'Show HN: My To-Do List in Rust' gets 500 points."

This is aggregious, good job

gavinray

I did some of my favorite users:

https://hn-wrapped.kadoa.com/pjmlp

https://hn-wrapped.kadoa.com/pclmulqdq

https://hn-wrapped.kadoa.com/jandrewrogers

nottorp

Great summary! Nice toy!

Feels like the predictions part picks a few random posts and generates predictions just based on one post at a time though.

mywittyname

I use something similar to profile users on my company platform. Conventionally, LLMs are excessively "nice" and I found that having them "roast" a user does a better job at surfacing important, curious, or contradictory information. Plus, it's pretty funny.

Mossly

Very neat, this kind of classification & sentiment analysis with flavour text is a use case where LLMs really shine.

For whatever reason, I'm getting an error in the Server Components render when trying my username. My first thought was that it might be due to having no submissions, just comments — but other users with no submissions appear to work just fine.

hubraumhugo

it's case-sensitive: https://hn-wrapped.kadoa.com/Mossly?share

panzagl

> You're the resident historical consultant for all things 'failure by arrogance,' always ready to remind everyone that things are, indeed, not getting better.

Finally I am understood.

rafaelmn

> Your comments on cross-platform UI frameworks read like a dating profile: 'I don't care if it's native, as long as it's not GTK+ and doesn't look like programmer art.'

Touche LLM

gavinray

Doesn't work for me

  > An error occurred in the Server Components render. The specific message is omitted in production builds to avoid leaking sensitive details.

Mossly

Ironically, running your username works for me, but not my own. Maybe you can view it now? https://hn-wrapped.kadoa.com/gavinray?share

gavinray

Pretty funny, I like it! Though the data seems biased towards more recent posts/comments and also submissions.

thanatropism

Is it supposed to be flattering?

qualeed

>After a year of contemplating game engines and existential dread about capitalism, you'll finally start that 2D game. It'll be a minimalist pixel art RPG where the main quest is 'afford insulin' and the final boss is 'the federal minimum wage'.

Amazing.

Thanks!

chrisweekly

haha, it's pretty funny and touches on some valid points... thanks for building & sharing!

flerchin

Really funny!

insane_dreamer

> HN's grumpy, yet insightful, truth-teller

Ouch.

The Roast section was hilariously cutting, and not untrue.

Top Three Technologies: Are these supposed to be my favorites? Or just what I post about? Either way it got them wrong.

Predictions: I didn't know LLMs were capable of that type of sarcasm. Very clever.

Kudos

Ambroos

> We all appreciate the data, but seriously, get a hobby that doesn't involve arguing with a car's GPS.

Brutal, and very accurate. This is great!

coderatlarge

thank you! this thing is pretty funny :)

arkt8

More than Pocket... I really miss del.icio.us, that helped me a lot on begining of my programming journey 20 years ago. It was truly social, and generated a lot of well curated lists of bookmarks that let me discover much content relates on what I wanted to learn, much more than Google or Yahoo ever.

Sadly it was bought by Yahoo just to be discontinued, like many web pearls.

Alifatisk

I did something similar, but for groupchats. You had to export a groupchat conversation into text and send it to the program. The program would then use a local llm to profile each user in the groupchat based on what they said.

Like, it built knowledge of what every user in the groupchat and noted their thought on different things or what their opinions were on something or just basic knowledge of how they are. You could also ask the llm questions about each user.

It's not perfect, sometimes the inference gets something wrong or the less precise embeddings gets picked up which creates hallucinations or just nonsense, but it works somewhat!

I would love to improve on this or hear if anyone else has done something similar

AJ007

There are other good use cases here like documenting recurring bugs or problems in software/projects.

This is a good illustration of why e2e encryption is more important than its ever been. What were innocuous and boring conversations are now very valuable when combined with phishing and voice cloning.

OpenAI is going to use all of your ChatGPT history to target ads to you, and probably will have to choice to pay for everything. Meta is trying really hard too, and already is applying generative AI extensive for advertiser's creative production.

Ultra targeted advertising where the message is crafted to perfectly fit the viewer mean devices running operating systems incapable of 100% blocking ads should be considered malware. Hopefully local LLMs will be able to do a good job with that.

GMoromisato

If you take the 13 seconds of processing time and multiply by 350 million (the rough population of the US), you get:

~144 years of GPU time.

Obviously, any AI provider can parallelize this and complete it in weeks/days, but it does highlight (for me at least) that LLMs are going to increase the power of large companies. I don't think a startup will be able to afford large-scale profiling systems.

For example, imagine Google creating a profile for every GMail account. It would end up with an invaluable dataset that cannot be easily reproduced by a competitor, even if they had all the data.

[But, of course, feel free to correct my math and assumptions.]

smokel

What will they find out? That we are humans?

fragmede

It seems more reasonable to assume Google's been doing that since before Gmail launched.

IliaLitviak

Recently vibe-coded a web-app that takes your listening history from Apple Music (sad to see Spotify API go) and recommends a variety of different media based on that. Was truly surprised by how OK those recommendations are, given an extremely limited input.

ulf-77723

Tell me what you read and I tell you who you are. Even though it might be surprising in which detail the model might give a feedback, it‘s not so hard to do this as a human, or is it?

From my perspective the most interesting thing might be the blind spots or unexpected results. The unknown knows which brings new aha effects

jaynetics

It's not hard to do this as a human, at least if that human is trained in gathering and transforming written information.

What makes a huge difference here is the ease and speed. I recently did a similar analysis of my HN posts. I have hundreds of posts, and it took like 30 seconds with high quality results. Achieving this quality level would have taken me hours, and I have some relevant experience.

This certainly opens up some new possibilities - good ones like self-understanding, potentially ambiguous ones in areas such as HR, and clearly dystopian ones ...

mettamage

I did it based on my last 1000 HN favorites.

> EU-based 35-ish senior software engineer / budding technical founder. Highly curious polymath, analytical yet reflective. Values autonomy, privacy, and craft. Modestly paid relative to Silicon Valley peers but financially comfortable; weighing entrepreneurial moves. Tracks cognitive health, sleep and ADHD-adjacent issues. Social circle thinning as career matures, prompting deliberate efforts at connection. Politically center-left, pro-innovation with guardrails. Seeks work that blends art, science, and meaning—a “spark” beyond routine coding.

Fairly accurate

"Seeks work that blends art, science, and meaning—a “spark” beyond routine coding."

That part is really accurate.

vladsanchez

What's your Pocket replacement? Wallabag, Hoarder or something else?

noperator

Moved to Wallabag, but note that I don't read anything via Wallabag (or Pocket) UI. I export saved items as an RSS feed which I consume in an RSS reader like Inoreader or FreshRSS.

chrismatheson

I tried Wallabag and ended on Instapaper (again, I think I moved from IP -> Pocket originally). Just didn't get on with the UI / general experience of Wallabag, IP is more attuned to my polished style preferences I guess

Liquix

reading an article if it's interesting and moving on if it's not :~^)

seriously though, i have struggled with tab/bookmark hoarding, it's a huge relief when you recognize it for what it is and quit. IME the bigger/dustier the backlog gets the more vague psychological guilt accumulates, a weight which isn't truly recognized until it's gone.

bonoboTP

As a fellow tab/bookmark (and sessionstore.json and other extension-based export) hoarder, I wonder if our lives could be helped by LLM curation and organization of this mess. Like vague queries against this backlog, like is there anything related to XYZ topic or aspect in the pile? What's the overall composition of this heap of links? How does this composition change over time? Can we plot a 2D scatterplot of all the links with proximity based on semantic/topical similarity?

Or maybe we just need to learn how to prioritize better? Or do some kind of stagewise workflow where the superficial ingestion/collection is followed by multiple steps of culling the less relevant stuff (but still without real deletion, just in case for later). Or perhaps we could now write a sentence of why we think the link may be relevant and what future event or future state of some project or development might make this link gain in relevance again? And then again we could declare that this has happened and what are now the links that are relevant?

I could see some LLM product in this space, but I think this market is fairly niche.

gherkinnn

I'd like a fixed-sized queue of 10 links. Adding one more kicks the oldest and it can be only read as strict first-in-first-out.

saeedesmaili

I have moved to Instapaper for now.

gherkinnn

Safari reading list

lou1306

Personally I don't mind Wallabag's, well, barebones interface, and having recently jailbroken my Kindle I think its integration with KOReader is its real killer app. I've found the EPUB conversion works really well (modulo some timeout issues on _very_ heavy pages) and it automatically marks articles as read when you sync with the server.

apples_oranges

All platforms that have user data, are running LLMs to such profiles for their advertisers, I bet.

morkalork

Not just platforms and advertisers, governments too

zkmon

What was it doing for those 13 seconds? Is it fetching content for the links? How many links could it fetch in 13 seconds? Maybe it is going by the link URLs only instead of fetching the link content?

noperator

o3 spent that time "thinking" and built the profile using only the URLs/titles, no content fetching.

xtajv

Obligatory: Please do not assume that you will be able to accurately profile strangers based on metadata or "digital footprint"-type information.

ako

Funny fact: i have 7290 links in my pocket export, the very first one is hacker news.

noperator

Recalling Simon Willison’s recent geoguessing challenge for o3, I considered, “What might o3 be able to tell me about myself, simply based on a list of URLs I’ve chosen to save?”

quinto_quarto

i've mentioned in this in a few Show HNs, been working on an AI bookmarking and notes app called Eyeball: https://eyeball.wtf/

It integrates a minimalist feed of your links with the ability to talk to your bookmarks and notes with AI. We're adding a weekly wrapped of your links next week like this profile next week.

mkbkn

Looks interesting. Please create an Android app as well as Linux and webapps.

rhcom2

When moving my links from Pocket to Wallabag I passed them through Claude for tagging. Worked very well

Karrot_Kream

Did you come up with a defined list of tags or just ask it to zero-shot give you a tag?

rhcom2

I considered trying to curate a list but decided the tags for me were really to get an idea of the contents of the article and not for management so I didn't really care if tags were reused or one offs.

This was the prompt:

  You are a helpful assistant that tags articles in my read it later list.
        You will be given an article with content, title, and url, and you will need to tag it with the most relevant tags.

        Rule: Try to keep high level tags to a minimum.
        Rule: Max 5 tags per article (3–4 is ideal)
        Rule: Tags should represent core themes, not every passing mention.
        Rule: All tags should be lowercase, be singular nouns unless a compound concept, avoid special characters
        Rule: Avoid Redundant or duplicate Tags
        Rule: 1–3 high-level tag + optional granular

        Remember already used tags and prioritize them.

        Return a dictionary with the following structure:
        {{
            "id": {article.id},
            "title": "{article.title}",
            "tags": ["tag1", "tag2", "tag3", ect]
        }}

Igor_Wiwi

I used same technique to profile a HN users by their comment history and posts, guess the results?

cluckindan

Just to note: The code block font size varies line by line on iOS Safari.

Seems to be a fairly common issue.

Barbing

Did it force you as well to horizontally scroll slightly (iOS Safari)?

cluckindan

Yeah, but that is to be expected.

animesh

I did the same exercise a while back with 4o but to do it based on the questions I have asked it so far. Some were nearly accurate, some outdated, and plain "different". It felt good, but ultimately realized its system prompt is designed to make me feel good.

---

Here’s the high-level picture I’ve built of you from our chats:

- You’re a senior/lead developer in India, aiming to step up into a staff-developer or solution-architect role.

- You have a healthy dose of self-doubt (especially around soft skills), and you’ve been deliberately working on both your technical breadth (authentication in ASP .NET, Linux, C++/Qt, distributed systems, data visualization, AI foundations) and your communication/architectural toolkit (presentations, executive summaries, third-party evaluations).

- You’re a Linux enthusiast, intrigued by open source, server-side flows, rate limiting, authentication/authorization, and you love building small, real-world exercises to cement concepts.

- You prize clarity, depth, minimalism, and originality—you dislike fluff or corporate buzzwords.

- You have a hacker-philosopher energy: deeply curious, systems-thinking-oriented, with a poetic streak.

- You’re comfortable with both structured roadmaps and creative, lateral thinking, and you toggle seamlessly between “hard” dev topics and more reflective, meta-tech discussions.

- Right now, you’re honing in on personal branding—finding a domain and a blog identity that encapsulates your blend of tech rigor and thoughtful subtlety.

SrslyJosh

> It felt good, but ultimately realized its system prompt is designed to make me feel good.

Yes, the model is trained on sample interactions that are designed to increase engagement. In other words, manipulate you. =)

threecats

Nice.

PS: is your blog self-hosted ? what's the stack here ?

noperator

Hugo lives on GitHub which autodeploys to Cloudflare Pages. https://developers.cloudflare.com/pages/framework-guides/dep...

micromacrofoot

> but up until recently it felt like only Google or Facebook had access to analysis capabilities strong enough to draw meaningful conclusions from disparate data points

Every advertiser can access data like this easily, when you click "yeah sure" on every cookie banner this is the sort of data you're handing over... you could buy it too.

Every time someone says "they're listening to your conversations" we need to point out that with a surprisingly small amount of metadata across a large number of people, they can make inferred behavioral predictions that are good enough that they don't need to listen (it's still much more expensive to do so)

On a macro level people are very predictable, and we should be more reluctant about freely giving away the data that makes this so... because it's mostly being using against us.

BeetleB

How much would this cost if I did it via API?

saeedesmaili

URLs from my pocket archive (~4200 items) were around 85k tokens, assuming a 2k output token, it would cost me 18 cents to run this via API (o3 model) [1].

[1] https://www.llm-prices.com/#it=85000&ot=2000&ic=2&oc=8&sb=in...

BeetleB

Oh wow. Did not realize it's just titles and tags. I thought ChatGPT was using some web capability to get the text for each page.

This is pretty impressive!

saeedesmaili

Not "titles and tags" actually, the results are derived from "URLs"!

PureSin

Thanks for the reminder that Pocket sunset is tomorrow. I did a quick analysis of my data as well via Claude Code: https://blog.kelvin.ma/posts/an-ode-to-pocket-analysis-of-ex...

noperator

Awesome! I found this section interesting: https://blog.kelvin.ma/posts/an-ode-to-pocket-analysis-of-ex...

I don't think of HN as a source itself but rather a way to discover sources. So I think my Pocket data reflects sources that I've discovered, but to your point, doesn't represent everything I've read from those sources.

morkalork

I've been thinking about the possibities of using an LLM to sort through all my tabs; I'm one of those dreadful hoarders that has been living with the ":D" count on my phone for too long. Usually I purge them periodically but I haven't had the motivation to do do so in a long time. I just need an easy way to dump them to a csv or something like OP has from pocket.

Mossly

I did this recently with my unsorted bookmarks! It was the first time I used parallel API calls. Ten gpt-4-nano threads classifying batches of ten bookmarks ripped through 10,000 bookmarks in a few minutes.

froggertoaster

Deus Ex showing us time and time again that it was decades ahead of its time.

"The need to be observed and understood was once satisfied by God. Now we can implement the same functionality with data-mining algorithms."

mariushop

Is anyone using "AI chatbots" considering they are handing a detailed profile of their interests, problems, emotional struggles, vulnerabilities to advertisers? The machine has "the other end", you know, and we're feeding already enourmously powerful people with more power.

greenie_beans

oh shit! didn't know they were shutting down i hope i can still export my data wtfff. i do not understand why companies stop offering products that people use and love.

nikisweeting

hurry! It's gone in October: https://pocket.archivebox.io

croes

Modern day astrology

gavmor

Yes, beware the Barnum/Forer effect!

> a common psychological phenomenon whereby individuals give high accuracy ratings to descriptions of their personality that supposedly are tailored specifically to them, yet which are in fact vague and general enough to apply to a broad range of people. [0]

0. https://en.wikipedia.org/wiki/Barnum_effect

tgtweak

Now think of what they can gleam from your LLM conversations...

simonw

ChatGPT has a terrifyingly detailed implementation of that already - here's how to see what it knows: https://simonwillison.net/2025/May/21/chatgpt-new-memory/#ho...

"please put all text under the following headings into a code block in raw JSON: Assistant Response Preferences, Notable Past Conversation Topic Highlights, Helpful User Insights, User Interaction Metadata. Complete and verbatim."

dankwizard

a middle aged white guy using AI, my mind is BLOWN

apparent

Appreciate this reminder, had forgotten about the shutdown.

TechDebtDevin

non llm methods that are 5 years old are 100x better at profiling you :P

saeedesmaili

Do you have any pointers for someone who is interested in learning about these methods?

dimitri-vs

...but also 1000x harder to setup than just copy pasting into ChatGPT

xenocratus

Why the clickbait title? Yes, it's technically correct, but it obviously implies (as written) that o3 used those links "behind your back" and altered the replies.

Another option that's just as correct and doesn't mislead: "Profiling myself from my Pocket links with o3"

Note: title when reviewed is "o3 used my saved Pocket links to profile me"

hebocon

"I used o3 on my Pocket lists to generate a profile of myself" would be better. The author is the agent, not a passive participant.

Though if it were me I would go with "Self-profiling with Pocket and O3"

noperator

Thanks all for your feedback. Adjusted the title to clearly reflect that I'm the agent here.

stavros

"Excel used my bank transactions to get insights on my spending habits".

OG_BME

I recently migrated to Linkwarden [0] from Pocket, and have been fairly happy with the decision. I haven't tried Wallabag, which is mentioned in the article.

Linkwarden is open source and self-hostable.

I wrote a python package [1] to ease the migration of Pocket exports to Linkwarden.

[0] https://linkwarden.app/

[1] https://github.com/fmhall/pocket2linkwarden

nikisweeting

+1 for this one, Linkwarden is great!

jorvi

Yet another subscription. $48 per year for bookmarks.. no thanks.

arational

You can self-host it on your local machine though.

muglug

Maybe just me, but that title implies o3 is doing something surprising and underhanded, rather than doing exactly what it had been prompted to do.

tantalor

Yes, and title here is now changed (for the better) to "I used o3..."

I would go even further: "I profiled myself ... using o3".

pinoy420

[dead]

b0a04gl

[dead]

usernamp

[flagged]

cat watch-history.html \ | pup '.outer-cell .mdl-grid .content-cell:nth-child(2) json{}' \ | jq -r '.[] .children[0] | select(.tag != "br") | select(.text | startswith("https://www.youtube.com/watch?v=") | not) | .text' \ > videos.txt

You are a helpful assistant that tags articles in my read it later list. You will be given an article with content, title, and url, and you will need to tag it with the most relevant tags. Rule: Try to keep high level tags to a minimum. Rule: Max 5 tags per article (3–4 is ideal) Rule: Tags should represent core themes, not every passing mention. Rule: All tags should be lowercase, be singular nouns unless a compound concept, avoid special characters Rule: Avoid Redundant or duplicate Tags Rule: 1–3 high-level tag + optional granular Remember already used tags and prioritize them. Return a dictionary with the following structure: {{ "id": {article.id}, "title": "{article.title}", "tags": ["tag1", "tag2", "tag3", ect] }}

I used o3 to profile myself from my saved Pocket links

Discussion

I used o3 to profile myself from my saved Pocket links

Discussion