Ghost Letters: The Hidden Signatures AI Leaves in Your Text

Exploring Ideas: A Blog on Technology, Startups, Food, and More

Ever notice something weird when you copy text from ChatGPT or Claude and paste it into your document? Maybe the quotes look fancier than usual, or there’s a dash that seems… different somehow. You’re not losing your mind. AI tools are leaving their fingerprints all over our text, and once you start noticing them, you’ll see them everywhere.

These signatures come in two flavors: the ones you can see (if you know what to look for) and the ones that are completely invisible. Both tell a story about how AI works, why it behaves this way, and what that means for anyone trying to spot machine-generated content.

The Fancy Punctuation Problem

Let’s start with the obvious stuff. When most humans write casually, we keep things simple:

Straight quotes instead of curly ones
Regular hyphens rather than em and en dashes
Three dots (…) when we want to trail off

But AI? AI has standards. It consistently defaults to what typesetters call “proper” punctuation:

Curly quotation marks instead of straight ones
Em dashes for dramatic pauses instead of hyphens
En dashes for ranges like “pages 10-15”
Single-character ellipses instead of three separate dots

This isn’t AI trying to show off. It’s a side effect of training data. These models learned from professionally edited books, academic papers, and high-quality journalism-sources that follow traditional typographic standards. When Wikipedia uses proper em dashes and The New York Times formats quotes correctly, that’s what AI absorbs as “normal.”

The result? AI-generated text often looks suspiciously polished compared to typical human writing. It’s like having a friend who always uses perfect grammar in text messages. Technically correct, but somehow… too perfect.

The Em Dash Epidemic

Speaking of em dashes, AI has developed what can only be described as an unhealthy obsession with them. These long dashes were once tools of nuance, used sparingly for emphasis or to set off explanatory phrases. Now they’re everywhere in AI writing, often replacing commas, parentheses, or simple periods.

The problem isn’t that em dashes are wrong: they’re perfectly valid punctuation. The issue is frequency and context. Human writers might use an em dash for dramatic effect or to create a specific rhythm. AI uses them because it learned they’re “proper” punctuation, without understanding when they’re actually appropriate.

It’s like having a dinner guest who insists on using the formal silverware for every course, including dessert. Technically not wrong, but it makes everyone else feel like they’re doing something incorrectly.

The Invisible Layer: Zero-Width Characters

Now for the really sneaky stuff. Beyond fancy punctuation lies a world of completely invisible characters that AI tools, particularly “humanizer” services, embed in text. These ghost letters live in the shadows of Unicode, serving various purposes from watermarking to detection evasion.

Here are the main culprits:

Zero-Width Space (ZWSP): Invisible characters that can break up word patterns without affecting appearance. A sentence might look normal but contain dozens of these hidden spaces.

Zero-Width Non-Joiner (ZWNJ): Originally designed for complex scripts like Arabic, these can separate characters that would normally connect.

Soft Hyphens: Invisible line-break hints that only appear when text wraps, but can persist in copied content.

Directional Formatting Characters: Control characters that specify text direction, sometimes left behind even in English text.

Why AI Tools Do This

The reasons vary depending on the tool and context:

Training Data Legacy: As we covered, AI models inherit the formatting standards of their training materials. Professional publications use proper typography, so AI does too.

Watermarking: Some AI services embed invisible signatures to track their content or prove ownership later.

Detection Evasion: “AI humanizer” tools deliberately insert hidden characters to confuse detection algorithms, breaking up patterns that might flag text as machine-generated.

Copy Protection: Hidden characters can help identify when content has been copied from a specific source.

Unintentional Artifacts: Sometimes these characters are just digital detritus-leftovers from text processing that didn’t get cleaned up properly.

Why This Matters

Beyond the immediate annoyance factor, these signatures have broader implications:

Academic Integrity: Students using AI tools might unknowingly submit content with hidden fingerprints that detection systems can find.

Professional Writing: Invisible characters can cause formatting issues when text moves between different systems or platforms.

Security Concerns: Hidden Unicode can potentially be used for steganographic communication or to bypass content filters.

Authenticity Questions: As these patterns become better known, they’re increasingly used to identify AI-generated content in contexts where that matters.

Detection and Cleaning

Want to see what’s hiding in your text? Here’s a simple approach:

Visual Detection

For the visible signatures, you’re looking for:

Curly quotes when you expected straight ones
Long dashes that seem different from your usual hyphens
Ellipses that look slightly different from three periods
Generally “too perfect” punctuation in casual writing

Hidden Character Detection

For invisible characters, you’ll need tools. Here’s a Python snippet that reveals common culprits:

def find_hidden_chars(text):
    suspicious = {
        '\u200B': 'Zero-Width Space',
        '\u200C': 'Zero-Width Non-Joiner', 
        '\u200D': 'Zero-Width Joiner',
        '\u00AD': 'Soft Hyphen',
        '\u202A': 'Left-to-Right Embedding',
        '\u202C': 'Pop Directional Formatting'
    }
    
    found = []
    for char, name in suspicious.items():
        count = text.count(char)
        if count > 0:
            found.append(f"{name}: {count} instances")
    
    return found

The Bigger Picture

These typographic fingerprints reveal something interesting about how AI learns and what it considers “normal.” The models aren’t trying to deceive anyone-they’re just reflecting the patterns in their training data. Professional publications use proper typography, so AI does too.

The invisible characters are more concerning, especially when they’re deliberately added by tools designed to evade detection. This creates an arms race between detection systems and evasion techniques, with invisible Unicode characters as the battlefield.

As AI becomes more prevalent in writing workflows, understanding these signatures becomes increasingly important. Whether you’re a teacher checking student work, an editor reviewing submissions, or just someone who wants their text to behave predictably, knowing what to look for can save you headaches down the road.

The ghost letters are out there, hiding in plain sight and in complete invisibility. Now you know where to find them.

Subscribe to the Newsletter

Get the latest posts and insights delivered straight to your inbox.