Which characters need to be escaped in HTML?

Which characters need to be escaped in HTML?

If you’re inserting text content in your document in a location where text content is expected1, you typically only need to escape the same characters as you would in XML. Inside of an element, this just includes the entity escape ampersand & and the element delimiter less-than and greater-than signs < >:

& becomes & < becomes < > becomes >

Inside of attribute values you must also escape the quote character you’re using:

” becomes ” ‘ becomes '

In some cases it may be safe to skip escaping some of these characters, but I encourage you to escape all five in all cases to reduce the chance of making a mistake.

If your document encoding does not support all of the characters that you’re using, such as if you’re trying to use emoji in an ASCII-encoded document, you also need to escape those. Most documents these days are encoded using the fully Unicode-supporting UTF-8 encoding where this won’t be necessary.

In general, you should not escape spaces as  .   is not a normal space, it’s a non-breaking space. You can use these instead of normal spaces to prevent a line break from being inserted between two words, or to insert          extra        space       without it being automatically collapsed, but this is usually a rare case. Don’t do this unless you have a design constraint that requires it.

1 By “a location where text content is expected”, I mean inside of an element or quoted attribute value where normal parsing rules apply. For example:



. What I wrote above does not apply to content that has special parsing rules or meaning, such as inside of a script or style tag, or as an element or attribute name. For example: , ,