Idea List

Open ideas, if you decide to build any of these let me know!

CLI tool that takes an image and sticks it on the web as a permalink. Currently copy images into GitHub and steal the link that it generates.
Reresume: upload a resume and a job description link and get a rewritten resume tailored to the job.
Sweresume: automatic LaTeX resumes for software engineers following a standard template. 1
Convert research papers to running code automatically, or at least a repo scaffold.
Match post writers across the web using stylometry.
Receipt extraction for splitting meals and groceries. Just take a picture of what you ate or purchased and have an LLM do the math. Could be possible when the GPT-V API drops. Fuyu-8B might work as well.
Build a knowledge graph of code blocks with an entire repository. There can be an agent that can operate on this graph asynchronously. I believe Cursor is working on something like this.
Interview prep using ChatGPT + voice mode.
Write a blog post evangelizing VS Code.
Embed your entire Twitter/Discord/etc and make connections from your likes, followers, etc.
Display some of the best pieces of writing on the Internet in a common place, with beautiful styling. For example all of the Paul Graham essays, some Twitter posts, Ted Chiang essays like Understand that only exist on old web archives.
Dscan: scan through your data and “swipe left/right” on good/bad examples to manually create a training set. 1, 2
Convert images to LaTeX, could be useful for converting complicated graphs or diagrams.
Automatically teach yourself using short form videos like the subway surfer TikToks. Can query content from anywhere - reddit, twitter etc. Need to make a more general data querying tool that I can use for things.
Summarize any subreddit with LLMs by querying the RSS.
Rebuild popular products like Postman as free, open-source tools.
An AR tool that lets you query references you like to make. Imagining something like Family Guy cutaways, effectively querying your memories.
A way to modify a calendar state by directly altering the JSON version with an LLM.
Embed solutions to LeetCode problems to compare similarity.
A language model to translate mixed languages like Chinglish or Hinglish.
AB testing for agentic web browsing.
Automatically create YouTube shorts/TikToks from subreddits.
Solve the issue where you have a ton of Jupyter notebooks all with slightly different variations of the same pre-processing code.
A little guy who organizes and cleans up your codebase while you’re not using your computer.
Link entire codebases together graphically and automatically and allow users to parse through the function calls in a spatial UI – “spatial coding”.
Map coding error messages to their solutions to build up a repository of personal debugging tricks automatically.
- A model could use this data for its own exploration when its stuck, so it learns on past stacktraces + Google search.
GPT vision tools:
- Better autonomous web browsing
- Receipt extraction
- Math homework helper
- Image to LaTeX converter
- Calorie counter by converting image to ingredients + use RAG to compare against nutrition facts
- Recipe generator from ingredients
- Diagram generator from whiteboard
- Excel analyzer
- Detect fake vs. real sneakers
- Structure-anything: convert anything to structured data
- An “anything API” since it can see the whole web
- Something that replaces Selenium/Playwright for web scraping
- Solve web accessibility
- Analyze screenshots of audio waves
Actually good dev-tools for things like base64 conversion, OpenAI token counting, etc. Good for LLM devs and regular devs alike.
Better interface for constant web-scraping using this.
Convert an MIT course into useful lecture notes. Should add a sliding scale that lets you modify the difficulty, can do this by pre-generating content for different education levels. 1
Compressing thought and expanding it again is like noising and denoising in diffusion models.
Continuously summarize HN, Reddit, etc. and generate podcast episodes, similar to ScribePod.
Better way to map things you like in a city, Felt is doing a great job at this.
Generate color palette suggestions from aesthetic images.
Nice view components for Spotify and other apps to make it easy to drop on your website.
Chrome extension for bionic reading.
BeReal memories downloader.
Notion page for all LeetCode problems with better solutions.
Predict stable diffusion prompts from the images, can train on DiffusionDB.
LeetCode but for debugging, currently no way to practice scenario-specific debugging. Solved: 1
Stable diffusion + Recaptcha.
Twitter except it’s just posts your friends have liked.
An easy way to analyze scrapes of your own liked tweets in embedding space, something better than grep.
Website to help develop intuition on hard math/CS concepts, best quality explainers.
Watermarking handwriting/typed text algorithmically, something like what Scott Aaronson worked on at OpenAI.
Grammarly + GPT.
Teach students things on TikTok by distilling MIT OCW into short form clips for coding and such.
Hack viral short form marketing for any product launch.
Train a model to chunk a long passage into human friendly sections. 1
Convert git commit history to a changelog. It could be something you add to a repo (GitHub action?) that keeps the README updated for instance.
Take any data and convert it into a knowledge graph, with a simple API that makes it useful for other projects.
Build a realtime navigation tool for blind people into an iPhone app. 1
Visualize information as embeddings, like taking a textbook and converting it into a cloud graph. 1, 2
Segment a webpage → embed it → select pieces based on a nearest neighbor search of the embeddings/query. Requires contrastive data for a web snippet/text, like CLIP.
Try using CogVLM to break Captchas. 1
Build a text editor that periodically takes snapshots of the text area and sends them to GPT for autocorrect, an auto editing writing editor basically.
Finetune an LLM on your own tweets.
Rebuild this but with GPT-V.
Route to any model, cloud-based or local, with ease.
Visualize loss curves in 3D with some cool interface.
So much to build with Anthropic MCP, Anthropic Computer Use, and Gemini Multimodal Live API.
Generate UI with a few keystrokes.
Customize podcasts for your ears – speaker diarization to speed up individual voices, edit pods at high level, easy to download to your device.
What to do while waiting for reasoning LLMs to think? Potentially a huge advertising play here.
Leetcode, except every problem is a real world scenario instead of the technical problem statements. The purpose is to understand where algorithms are applied. Could implement using a Chrome extension that calls out to an LLM to convert the problem statement into a real world scenario, and still run the code on Leetcode.
Automatically run git bisect when debugging and tell an LLM what bug you’re looking for. Like you can ask it to find which commit broke a button or something, and it can go render the page at different states using bisect.
Goodreads with minimal, modern UI.
Chrome extension that lets you hide people on Twitter who are being annoying, using something funny like a CSGO AWP.
Vibe check industries by taking the average sentiment of a subreddit (“how are the folks on r/CSMajors doing today?“)
Track company job openings over time to see if they’re actually reducing hiring due to the advent of AGI.
Is it AI? A game to determine if a poem is AI generated or not.
Crowdsourced map of AI startup offices to see how the epicenter of AI changes over time.
March madness prompt battle. Select a model, write a prompt, and pay $5 to let your AI bracket compete against others.
More vibe coding with ThreeJS. 1.
Stock analyst in your email. Give it a few tickers you care about and it’ll use Perplexity’s API for research + Claude for summaries.
Text a number to create a Linear ticket (or maybe use iOS shortcuts).
Let people vote on quality of software tools - realtime poll of the best languages, apps, etc. Login with GitHub. Stuff like Flighty and Beli that flies under the radar, should make them more visible.
Scrape SF housing data from Craigslist in realtime and filter for the best deals with LLMs.
Auto-updating spreadsheet of all niche SaaS apps and stats.
Automatically route to the best LLM for a given task based on current latency.
Scrape 4chan for alpha from the LLM/CV communities.
Run stylometry on popular Twitter accounts to figure out if people are ghostwriting for others.
There may be something interesting to do with NotebookLM if they ever release an API.
GitHub style commit graph for Congress. 1
There are a million companies to start that use GPT-4o native image
Cool ways to store data like chess games. 1
Solve Dunnet with LLMs automatically.
LLMs for analyzing logs and automatically finding errors.
Guess how many likes a Tweet or TikTok has by reading the content alone, to tune your internal model.
Cursor extension for adding meta comments that only the LLM can see.
Read every word a person has ever written by having LLMs scrape the internet for everything.
Unedited, real-time timelapses of people using AI, to understand how the most effective work is done.
Post what you got done this week anonymously.
Use Claude computer use or BrowserBase for headless automation.
Use some of the new video models like Veo, HeyGen, Ideogram, Runway, etc.
LaTeX codegen → Vision LM: let the LLM generate LaTeX, see the output and iterate.
PDF to brainrot.
Dwarkesh podcast generator for papers and articles, turn any text into a podcast (like NotebookLM).
Use dithering for something cool. Visual Electric is also a really cool image generation tool.
Unbrick the Car Thing from Spotify.
Map OpenAI releases to employee GitHub activity.
Track terrible journalist takes from a long time ago on a website. Maybe some way to rank and grade them?
Pictionary except you have to guess the prompt used to generate the image, use something like FAL.
Compute sentiment of quotes of a Tweet, figure out if people are mad at it.
Let Claude control an iPhone with iPhone mirroring + computer use.
Pictionary where you draw and the LLMs try to guess.
A playground to quickly iterate on AI UX interfaces (I guess v0 is basically this).
hnfast.com: get the high level updates.
GazeLLE for % eye contact made in a podcast, figure out who can keep eye contact. generation in some clever way.
Shortcut (maybe text) for creating calendar events from text.
HUD for autocomplete while you’re in a meeting, so you can easily figure out what to say next.
Automatically ingest the Internet on a user’s behalf and screen it for memetic viruses. “Internet condom”. 1

Ishan's Cafe

Explorer

Idea List

Graph View

Backlinks