PHP Markdown's no-markup mode

I’ve been contacted many times by people asking about how to disable HTML within PHP Markdown. Up until a few months ago, I was opposed to offering that possibility on the ground that HTML is part of the Markdown syntax. After all, Markdown was designed so that if the syntax doesn’t have what you want, or if you just don’t know the syntax, you can fallback to HTML.

Removing HTML support in that context would mean more pressure to implement a Markdown-specific syntax for things that Markdown (with HTML) doesn’t really need. It also force people to learn the syntax because they can’t use HTML anymore. The better option, it seems to me, is to simply restrict what HTML tags and attributes can be used within Markdown.

But I kept receiving questions about how to best disable HTML within PHP Markdown, and for various reasons many weren’t impressed much by my arguments. And the technical answer wasn’t very straightforward: unless you want to sacrifice code blocks and spans, and automatic links, you just can’t escape in advance the less-than < character used to open a tag.

Basically, many people implemented it wrong without even noticing (because they don’t use much automatic links or code blocks and spans). It appeared to me that this was more harmful to users trying to learn Markdown than the lack of HTML fallback. So I changed my stance about the problem and decided to help those who want to disable HTML completely.

If you want to disable HTML in PHP Markdown, please don’t hack.

PHP Markdown has a (hidden) setting in its latest version to do exactly that. Just instantiate a parser yourself and set the no_markup property this way:

$parser = new Markdown_Parser; // or MarkdownExtra_Parser
$parser->no_markup = true;
$html = $parser->transform($text);

There’s also a no_entities property you can set the same way if you want to disallow character entities.

Note that by forbidding HTML markup you’re denying the users of your script, CMS, or web application the necessary fallback for elements Markdown does not provide, such as <sup> and <sub>, <ins> and <del>, <q>, <bdo>, <abbr>, <object> (required to embed video), and many others. A better idea may be to just filter out the HTML output for unwanted HTML, using something such as kses, but I’ll let you be the juge of what’s best for you and your users.

With power comes responsibilities: please make sure your users have the best Markdown experience you can offer them. Thanks.


Comments

Andreas

I’ve been using markdown for a couple of years now and I actually implemented my own solution to disallow HTML using htmlentities. Like you mentioned autolinks and, in the case with htmlentities, blockquotes stop working but I fixed the blockquotes using a simple regexp.

It’s great that markdown now has built in support for this, but will it actually remove all tags or turn them in to entities?
I think that if a user enters HTML in a form with the instruction that no HTML is allowed s/he will expect his/hers HTML to still show up in his/hers comment (but as entities, not as real HTML). Otherwise there’s no way of inserting code in a comment? Or?

The latest post on my site is actually about how I tweaked markdown for my site. But after reading “If you want to disable HTML in PHP Markdown, please don’t hack.” I feel a bit foolish now.. :|

Btw, thanks for doing this work. Markdown is amazing!

Michel Fortin

The no_markup property will disable all tags; the no_entities property will disable all character entities. Set both to true if you wish to disable HTML completely. If it doesn't work, it's a bug; please let me know.

Are you meaning to say there's no other way to add code to comments… What about using Markdown’s code blocks and code spans?

Andreas

Oh yea, that’s true. But would be kind of nice to have something like a htmlentities = true setting so that instead of removing tags they would simply be “entitized”.

Sorry about the double-post before btw.

Michel Fortin

Well, when I say disable tags and entities, it means that Markdown sees them as regular text (not as tags or entities) and so generates the required HTML character entities for HTML character data, just as when you write 1 < 2 or AT&T. Perhaps you should give it a try.


  • © 2003–2024 Michel Fortin.