A reasonable question we get from teams already using Playwright MCP: if the agent can read the DOM, query computed styles, and assert on element positions, why does it need a screenshot at all?
The short answer is that most layout bugs are not in the DOM. They are in the relationship between elements that the DOM does not encode.
What the DOM cannot tell you
Consider a hero section where the CTA button overlaps the subtitle on mobile. Both elements are in the DOM. Both have valid computed styles. Both have display: block and a sensible margin-top. The DOM-level checks all pass.
The bug only exists in the rendered pixels — the subtitle wrapped to two lines on a 375-wide viewport because the headline was longer than the design assumed, and the CTA’s margin-top was a fixed 24px instead of a clamp(). Nothing in the DOM is wrong. The composition is wrong.
This category includes:
- Overlap and collision — two correctly-positioned elements that happen to occupy the same space at one breakpoint
- Whitespace asymmetry — a card grid where one card is taller and the row gap looks broken
- Color contrast — text that passes WCAG in isolation but disappears against a gradient background
- Image composition — a hero photo with the subject cropped out by an
object-positiondefault - Z-order surprises — a tooltip that renders behind a sticky nav
A DOM dump can describe each of these elements. It cannot tell you they look wrong together.
Why the agent benefits more than a human
A human developer notices these problems peripherally — you scroll past and your eye catches the off-balance spacing without consciously checking. An agent reading a DOM dump has no peripheral vision. It has to ask the right query about the right element, and most layout bugs require asking a query you did not know to ask.
A screenshot inverts the problem. The agent looks at the rendered page the same way a designer would, and the bugs that are visually obvious become text-extractable: “the second pricing card is 12px taller than the others,” “the CTA button overlaps the subtitle below 400px width.”
When DOM dumps still win
DOM inspection is the right tool for state verification — is the form in an error state, does the dropdown have the right options, is the loading spinner visible. Anything where the question is “does this element exist with this attribute” stays cheaper as a DOM query than a screenshot.
The mental model: DOM for state, screenshots for composition. Most agents need both, and most teams already have the first half.
Try it
Give your agent eyes in 30 seconds.
One CLI command. 100 visual checks free every month. No credit card.