untaint, taint, apply-taint. Transforming data

taint, untaint, apply-taint. Transforming data

^taint[text]
^taint[transformation type][text]
^untaint{code}
^untaint[transformation type]{code}
^apply-taint[text]   [3.4.1]
^apply-taint[transformation type][text]   [3.4.1]

Parser enables automatic data transformations to protect your system against intrusion and the "default" security level is high. It works even if your code contains no operators taint/untaint. If you interfere by using these operators (especially for as-is transformations), you may increase the risk of security vulnerability. Therefore, study the mechanism carefully before writing code.

Operator taint marks the text received as "needing transformation of a certain type". If transformation type is unspecified, taint marks it as "tainted" (needing undefined transformation). Text marked "tainted" is subject to the type of transformation applied to external text (coming from from field, database, file, cookies, etc.).

Operator untaint executes the code received and marks "needing transformation of a certain type" the tainted parts of the execution result (i.e. pieces that did not constitute part of the Parser code within the document body, either external or marked "tainted" by the taint operator). It does not concern parts subject to transformation of a certain type. If transformation type is unspecified, untaint marks the tainted pieces of the execution result as as-is.

Text is marked for transformation to be performed later, when the apply-taint operator is called, the document is outputted to browser, sent to SQL server, saved into a file, sent out through e-mail, etc.

Operator apply-taint applies inplace transformation to all tainted parts of the string. Parts within undefined transformation type will be processed using specified transformation type (as-is by default).

For simplicity you can think about it as if Parser interprets external characters as ^taint[external text], and text within the body as ^taint[optimized-as-is][typed text].

In some cases ^taint[transformation type][text] and ^untaint[transformation type]{text} produce the same result. It happens when the whole text is tainted (for example, $form:field). However, keep in mind that these operators have different default parameters, and applying both without transformation types to a tainted text will create absolutely different results.

When outputting to browser, Parser automatically applies type optimized-html, and the code looks like this:
^untaint[optimized-html]{typed code}

It means that if you write $form:field (not using taint/untaint) within the body, then even if "?field=</html>" is called, the page shall not be "crippled" due to the closing tag </html> appearing too early, because the content of $form:field is tainted and will be subjected to automatic optimized-html transformation that replaces greater-than and less-than signs ('<' and '>') with entity references '<' and '>'.
Other automatic transformations are performed in the same way. For instance, an SQL query containing ^string:sql{SELECT name FROM table WHERE uid = '$form:uid'} (again, not using taint/untaint) cannot be subverted by SQL injection using parameter "?uid=' OR 1=1 OR '", because Parser shields the single quotes in the $form:uid received before sending the query to server.

Text within the body is also automatically transformed. Parser optimizes whitespace symbols: space, tabulation characters and line breaks. If these symbols appear in a row, they are replaced with the first one of them. In other words, if you type several spaces, they become only one before viewing. If you need to disable this optimization (for example, when using <pre/>), do it explicitly by writing, for instance, the following:

<pre>
^taint[as-is][
   I strode off the
   high cathedral
    top-most step like a
     miracle worker, or a
      Blessed
       passing the final exam for
        Saint. The
         city expanded at my
          feet. For one
           pico-second, I
            flew.
]
</pre>

In this case, you must use taint, as the typed characters are untainted and untaint would not produce any effect.

Example
$clean[<br />]
# the above expression is equivalent to this: $clean[^taint[optimized-as-is][<br />]]

$tainted[^taint[<br />]]

Strings: ^if($clean eq $tainted){match}{do not match}<br />

Tainted data-'$tainted'<br />
Untainted data-'$clean'<br />

This example shows that although comparison show that strings are equal, a browser will display different results-the untainted string is not transformed, whereas '<' and '>' in the tainted one are replaced with '<' and '>'.

Example
Example using ^untaint.<br />
<form>
<input type="text" name="field" />
<input type="submit" />
</form>

$tainted[$form:field]
Tainted data-'$tainted'<br />
Untainted data-'^untaint{$tainted}'

Transformation type for untaint is specified inside square brackets. Here it is omitted, which means using the default parameter as-is. Note that while untaint with unspecified transformation type is equivalent to untaint with as-is transformation, taint has no transformation equivalent to taint with unspecified type.

Example
Example ^taint.<br />
$city[New York]
<a href="city.html?city=^taint[uri][$city]">$city</a>

As a result, contents of variable city are transformed into URI type. Cyrillic characters, white spaces and other characters which must be encoded, would be replaced with hex entities and represented as %XX.

Example
Example, illustrating difference between ^taint and ^untaint.<br />
$s[?   ^taint[?]   ^taint[uri][?]   ^taint[file-spec][?]]
<pre>^apply-taint[uri][$s]
^apply-taint[uri][^taint[as-is][$s]]
^apply-taint[uri][^untaint{$s}]
^apply-taint[uri][^untaint[uri]{$s}]</pre>

Output:
?    %3F    %3F    _3F
?    ?    ?    ?
?    ?    %3F    _3F
?    %3F    %3F    _3F

Example
Ouputting and saving user submitted data and generating XML<br />
You specify: '$form:field'

^connect[$SQL.connect-string]{
   ^void:sql{INSERT INTO news SET (body) VALUES ('$form:field')}
}

$doc[^xdoc::create{<?xml version="1.0" encoding="UTF-8"?>
<root>
   <data>$form:field</data>
</root>
}]

In this case, you need neither taint nor untaint, as all the necessary transformations will occur automatically with transformation type optimized-html for output to browser, sql for sending data to server and xml for generating xdoc object.
Note that you also do not need to write taint/untaint in SQL queries when saving data to a database using administrative interface.

Example
Outputting user submitted data or data coming from a database (may contain tags) to an edit form<br />
^if(def $form:body){
   $body[$form:body]
}{
   ^connect[$SQL.connect-string]{
      $body[^string:sql{SELECT body FROM news WHERE news_id = $id}]
   }
}
<textarea>$body</textarea>

In this example optimized-html transformation will be performed automatically, because the data submitted by the user or coming from a database are tainted. If the data contains any tags, they will not affect the page. Remember that sequences of white spaces in $body will be optimized during output.

Example
Outputting data coming from a database containing administrator written tags<br />
^connect[$SQL.connect-string]{
  $body[^string:sql{SELECT body FROM news WHERE news_id = $id}]
}
^taint[as-is][$body]

Here you should use taint specifying transformation type as-is (or untaint specifying this type), for the tags included in the news code by the administrator need not undergo any transformation. This method must not be used for the data submitted by visitors to the website such as guest book information, forum entries, etc.

Example
Outputting user submitted data or data coming from a database (may contain tags) to an edit form keeping spacing symbols<br />
^if(def $form:body){
  $body[$form:body]
}{
  ^connect[$SQL.connect-string]{
     $body[^string:sql{SELECT body FROM news WHERE news_id = $id}]
  }
}
<textarea>^taint[html][$body]</textarea>

In this case, use taint specifying transformation type html (or untaint with this type) to avoid crippling the page and to disable optimization of space characters.

In the above examples operator taint was used only three times: for displaying administrator added tags in database-derived text, for disabling optimization of spacing symbols, and for outputting query string containing encoded characters (for example, white spaces and Cyrillic letters).
Otherwise, there was no need for taint/untaint, and Parser managed everything on its own.

Remember that it is better not to use these operators unless necessary.

You might have noticed that none of the examples used untaint. This raises the question of its usefulness. Here are a couple of practical examples.

Firstly, it sometimes helps to reduce the number of the taint operators in the code. For example, when outputting data to a multi-field form with spacing optimization disabled. In this case, you can apply ^untaint[html]{…} to the whole form instead of writing ^taint[html][…] for each textarea value.

Example
Outputting user submitted data or data coming from a database (may contain tags) to a large edit for keeping spacing symbols<br />
^if(def $form:title){
   $data[$form:fields]
}{
  ^connect[$SQL.connect-string]{
      $data[^table::sql{SELECT title, lead, body FROM news WHERE news_id = $id}]
   }
}

^untaint[html]{
   <p>
      <b>Heading</b><br />
      <textarea name="title">$data.title</textarea>
   </p>
   <p>
      <b>Announcement:</b><br />
      <textarea name="lead">$data.lead</textarea>
   </p>
   <p>
      <b>News</b><br />
      <textarea name="body">$data.body</textarea>
   </p>
}

Secondly, you can use it to output xml to browser (for instance, for ajax, RSS, SOAP, etc.). In this situation optimized-html is not appropriate, and you must enclose the code in ^untaint[optimized-xml]{…} to ensure correct output.

The transformation is replacement of some characters by others, according to built-in transformation tables. The following types of transformation are available:

as-is
file-spec
http-header
mail-header
uri
sql
js
json   [3.4.1]
parser-code   [3.4.0]
regex   [3.1.5]
xml
html

optimized-as-is
optimized-xml
optimized-html

Transformation table

as-is	no transformation
file-spec	characters * ? " < > \| are replaced with _XX, where XX is character's hex-code
uri	characters other than numbers or lower/uppercase Latin letters as well as characters _ - . " are replaced with %XX, where XX is a character's hex-code
http-header	the same as URI
mail-header	if charset is known (if not, upper/lowercase will not work), the fragment starting with the eighth-bit first letter and until the end of the string will be represented in such a way: Subject: Re: parser3: =?koi8-r?Q?=D3=C5=CD=C9=CE=C1=D2?=
sql	depending on SQL-server for Oracle, ODBC and SQLite ' is replaced with '' for PgSQL characters ' and \ are prefixed with \ for MySQL characters ' " and \ are prefixed with \, characters with codes 0x00 0x0A 0x0D are replaced with \0 \n \r for transformation needed that code which made a transformation are located inside ^connect[]{} operator.
js	" is replaced with \" ' is replaced with \' \ is replaced with \\ newline character is replaced with \n character with code 0xFF is preceded by \
json	characters " \ / are prefixed by \ newline character is replaced with \n tab character is replaced with \t characters with codes 0x08 0x0Ñ 0x0D are replaced with \b \f \r in case of non-UTF-8 output all unicode characters is replaced with \uXXXX
regex	characters *\ ^ $ . [ ] \| ( ) ? + { } -** are prefixed by \
parser-code	special characters are prefixed by ^
xml	& is replaced with & > is replaced with > < is replaced with < " is replaced with " ' is replaced with '
html	& is replaced with & > is replaced with > < is replaced with < " is replaced with "
optimized-as-is optimized-xml optimized-html	in addition to replacements, optimizes "white spaces" (space, tab, newline characters). multiple repetition of above-mentioned characters in a row is replaced with a single one-that which goes first in the row

A number of taint transformations are made automatically. Thus, names of files and paths are always automatically transformed with file-spec and when you write…

^file::load[filename]

…Parser executes…

^file::load[^taint[file-spec][filename]]

Similarly, when HTTP-headers and mail headers are defined, Parser executes http-header and mail-header transformations respectively. During DOM-operations, text parameters of all methods are automatically xml-transformed.

Parser also performs a number of automatic untaint transformations:
type
what is transformed

sql
body of SQL-query

xml
XML-code-while an object of class xdoc is created

optimized-html
page output to browser

regex
REGEX-patterns

parser-code
body of operator process

Last updated: 25.01.2021

type	what is transformed
sql	body of SQL-query
xml	XML-code-while an object of class xdoc is created
optimized-html	page output to browser
regex	REGEX-patterns
parser-code	body of operator process