textutil-cmds - Utility commands for processing text
These commands support common tasks of processing text chunks in NaviServer applications.
Parses the provided HTML text into a tagged list, where each list item starts with a type tag indicating the element's content. Possible type tags are comment, pi, tag, or text. For tag elements, the type tag is followed by the parsed string in the first list element and by a Tcl dict in the second list element containing the parsed HTML attributes.
When the option ?-noangle? is specified, the angle brackets (less and greater sign) are removed in the result.
When the option ?-onlytags? is specified, only tag elements are returned, without the leading type tag. This can be used for checking whether tags in an HTML snippets needs to be closed.
% ns_parsehtml {hello <b>foo</b> anchor <a href="/foo">world</a>.} {text {hello }} {tag <b> {b {}}} {text foo} {tag </b> /b} {text { anchor }} {tag {<a href="/foo">} {a {href /foo}}} {text world} {tag </a> /a} {text .} % ns_parsehtml -noangle {hello <b>foo</b> anchor <a href="/foo">world</a>.} {text {hello }} {tag b {b {}}} {text foo} {tag /b /b} {text { anchor }} {tag {a href="/foo"} {a {href /foo}}} {text world} {tag /a /a} {text .} % ns_parsehtml -onlytags {hello <b>foo</b> anchor <a href="/foo">world</a>.} {b {}} /b {a {href /foo}} /a
Returns the contents of HTML with certain characters that are special in HTML replaced with an escape code. The resulting text can be literally displayed in a webpage with an HTML renderer. Specifically:
& becomes &
< becomes <
> becomes >
' becomes '
" becomes "
All other characters are unmodified in the output.
This is essentially the inverse operation of ns_quotehtml and replaces the named and numeric entities in decimal or hexadecimal notation contained in the provided string by their native characters. ASCII control characters are omitted.
Returns the contents of html with all HTML tags removed. This function replaces as well all known HTML4 named entities and numeric entities in decimal or hexadecimal notation by its UTF-8 representations and removes HTML comments. ASCII control characters are omitted.
Reflow a text to the specified length. The arguments width (default 80) and offset (default 0) are integers referring to number of characters. The prefix can be used to prefix every resulting line with a constant string.
Multi-line trim with optional delimiter or prefix. The command is useful, when not the full indentation from the source code file (with a indentation depending on the nesting level) should be preserved on the output (such as SQL statements, HTML markup, etc.).
When neither -delimiter or -prefix is specified all leading whitespace is stripped from the result. When -delimiter is specified, the delimiter is stripped as well. The specified delimiter has to be a single character.
When -prefix is used the specified string will be stripped from lines starting exactly with this prefix (example: use -prefix >> to strip the prefix >> from every line starting with it. This option is mutual exclusive with the option -delimiter.
Optionally, substitution can be used, which is applied before trimming (not really needed but sometimes convenient).
% ns_quotehtml "Hello World!" Hello World! % ns_quotehtml "The <STRONG> tag is used to indicate strongly emphasized text." The <STRONG> tag is used to indicate strongly emphasized text. % ns_quotehtml {<span class="foo">} <span class="foo">
% ns_reflow_text -width 15 -prefix "> " "one two three four five six seven eight nine ten" > one two three > four five six > seven eight > nine ten
% ns_striphtml "<MARQUEE direction='right'><BLINK>Hello World!</BLINK></MARQUEE>" Hello World!
% ns_trim { SELECT object_id, object_name FROM acs_objects WHERE object_id > 10000 } SELECT object_id, object_name FROM acs_objects WHERE object_id > 10000 % ns_trim -delimiter | { | <ul> | <li> one | <li> two | <li> three | </ul> } <ul> <li> one <li> two <li> three </ul> ns_trim -prefix "> " { > line 1 > line 2 } line 1 line 2