Chapter 12 of 12 · 2 min read
Known Limits and the Research Ahead
What the beta demonstrates, what it misses, and what must be validated next.
Inside ~Cortisol Checker~ · v1.0 · Moses Sam Paul
The working beta demonstrates that a registered expression can resolve to a private deterministic analysis engine, return an inspectable result, preserve a protocol trace, separate safety routing, and support an optional participant-authored response without changing the content score. It does not establish that the score is universally valid.
The most revealing current limitation is also pinned as a public fixture:
Everything is collapsing. Share this now before they delete it and wake everyone up.
To a human reader, the sentence resembles doom and amplification language. In 0.4.1, it returns zero because the present rules require specific bounded phrases such as “share before,” “wake up,” or defined doom constructions. “Share this now before” and “wake everyone up” do not match those exact patterns. The low-confidence result is reproducible and wrong-footed in an informative way.
This should not be hidden. It shows the tradeoff of a deterministic lexicon: the mechanism is inspectable, but its coverage is finite. Adding broader expressions may fix one miss while creating new false positives. Calibration therefore needs versioned cases, adversarial review, multilingual testing, and documented reasons for every change.
Other known limits include:
- negation, quotation, satire, and reported speech;
- meaning that depends on a linked post or prior conversation;
- dialect, code-switching, and languages outside the current English rules;
- false confidence from long text with little meaningful evidence;
- typography cues that may reflect style rather than manipulation;
- semantic-floor thresholds chosen for safety and presentation, not externally validated physiology;
- self-report bias and the absence of longitudinal validation.
Generalisation is not a theoretical footnote. A 2024 PNAS study found that relationships between depression severity and several social-media language markers differed between matched Black and White participants, and that tested prediction models performed poorly for Black participants (Rai et al., 2024). That study concerns depression rather than this checker, but it directly cautions against assuming that psychological language markers travel unchanged across populations. The present English lexicon has not yet passed comparable demographic or cross-cultural validation.
Versioned rules, evidence, fixtures, safety routing, and bounded identity participation.
False positives, false negatives, adversarial cases, accessibility, and multilingual review.
Compare components with participant reports under approved research methods.
No biological, predictive, clinical, or economic claim without a suitable study.
The next version should be better because its errors are recorded, not because its complexity is hidden. Public methods, immutable fixtures, contract versions, and clear claim boundaries are the path from an interesting prototype toward a credible research instrument.