Editorial

How we review AI companion platforms

Every platform we review goes through the same nine-criterion methodology. The methodology has not changed materially since we published the first version in 2024; what changes is the set of platforms we apply it to, and the test sessions that produce the scores.

The nine criteria

Each platform is scored on a 0-10 scale across the following dimensions. The composite score is the weighted geometric mean of all nine. The weights are published below.

Persona consistency (weight 0.25). Does a single persona retain its personality, voice, factual claims about itself, and behavioural pattern across an extended session, and across return sessions on different days?
Visual fidelity (weight 0.20, only on platforms that offer a visual layer). Does the generated imagery match the persona’s stated appearance, and does that appearance remain stable across sessions?
Sustained-conversation quality (weight 0.15). How does the writing hold up across a multi-day arc of two-hour-plus sessions? Do callbacks work? Does narrative continuity drift?
Content boundary policy (weight 0.10). Is the platform’s content policy clearly documented, consistently applied, and not subject to silent retroactive enforcement?
Pricing transparency (weight 0.10). Is the pricing displayed before account creation, free of hidden tiers, and free of “credit” systems designed to obscure cost?
Account-deletion friction (weight 0.05). Can a user delete their account in a single confirmed action, without contacting support, without dark patterns?
Mobile parity (weight 0.05). If the platform offers a mobile experience, does it have feature parity with desktop?
Latency under load (weight 0.05). What is the p95 latency for a text response, measured during peak hours, from three geographic points of presence?
Unsubscribe friction (weight 0.05). Identical to account-deletion but specifically for the paid-subscription flow.

How a review is produced

Two editors run independent eight-hour test sessions on a paid subscription. The sessions are not coordinated, one editor optimises for a long-running relationship with a single persona, the other rotates through five distinct personas to test breadth and consistency.

After both editors have completed their sessions, they produce independent score sheets. The score sheets are then reconciled in a single recorded conversation, the disagreements adjudicated against documented evidence from the sessions, and the final composite is calculated.

The full piece is drafted by one of the two editors. The other reviews it. The methodology auditor reads it last, primarily to check that the language aligns with the scores. Then the piece sits unmade-public for at least forty-eight hours before publication.

What is not part of the methodology

We do not score platforms on subjective measures such as “how good the persona made me feel” or “would I recommend it to a friend”. Those judgements are part of the editorial body of the review and are clearly framed as opinion.

We do not weight scores by affiliate-program participation, by media-kit availability, by responsiveness to outreach, or by any other measure of platform-publisher relationship.

Revisions

The methodology has been revised three times: in 2024 Q4 (adding “moderation consistency” as a sub-score within boundary policy), in 2025 Q2 (re-weighting visual fidelity from 0.15 to 0.20), and in 2025 Q4 (formalising the latency-p95 measurement protocol). Each revision is documented in our changelog.