Session Replays Are Underrated for Bug Detection

Every product team I know has session replay tooling. PostHog, FullStory, Hotjar — pick your flavor. And almost every team uses them the same way: someone reports a bug, you pull up the replay, you watch what happened.

This is reactive. What if replays could be proactive?

The insight

Session replays are video-like recordings of user behavior. Vision models are getting very good at understanding video. The connection should be obvious, but almost nobody is making it.

What if you could run every session replay through a vision model and ask: "Did anything unexpected happen here? Did the user seem confused? Did the UI behave in a way that doesn't match the expected flow?"

What we're building

This is the core idea behind Flick — our new project that sits on top of PostHog session replay data. Here's how it works:

PostHog captures session replays as usual
We sample frames from each replay at key interaction points
A vision model (currently Gemini, for cost reasons) analyzes the sequence
We flag sessions where the model detects potential UX friction or visual bugs

The model doesn't need to understand the codebase. It just needs to understand what "normal user flow" looks like versus "something went wrong."

Early results

In our first week of testing on our own product, Flick caught:

A modal that was rendering behind another element (technically functional, visually broken)
A form submission that succeeded but showed no feedback to the user
A loading state that lasted 8 seconds with no spinner or progress indicator

None of these would trigger an error in your logs. None would fail an E2E test. All of them were degrading user experience.

The cost question

Running vision models on every session replay sounds expensive. It is, if you're naive about it. The key is intelligent sampling — you don't need to analyze every frame of every session. Focus on:

Sessions with high rage-click density
Sessions that ended without completing the expected flow
Sessions from new users (where confusion is most costly)

With smart sampling, we're running analysis on about 15% of total sessions at a cost that's comparable to what teams already spend on error monitoring.

Where this goes

I think every observability platform will have a vision-model layer within two years. The data is already being collected — session replays, screenshots, visual regression snapshots. The models are good enough. The only missing piece is the integration layer.

That's what we're building.