CSVfox
Leverage Your Data.
 English

Character Escaping and Placeholders

Defusing special characters in CSVfox commands

Hide special characters from expression resolving

There are a handful of characters that have special meanings in the context of CSVfox commands resp. command line or shell parameters.

If you need to use one of these special characters als literals in text or expression, it must be "escaped". This prevents misinterpretation of the character when the expression is resolved by the application. It is then hidden from interpretation as their special meaning.
The escaping will finally be removed, and the intended character will stay.
 

Escaping of Characters

Escaping a character is done by prepending a backslash (\) before it.

Any of the special characters in the table below can be "hidden" from erroneous resolving through typing a backslash in front of it.

Example:
This expression:  [Fish] stands for the column named "Fish". The square brackets say that this is a column identifier. So it will be replaced by the "Fish" column's current text contents each time it appears in a resolvable expression (see Expressions).
But this expression:  \[Fish\]  (with "defused" square brackets!) stands for the literal text "[Fish]". It will be resolved through directly replacing it by the text "[Fish]".

For a literal backslash "\", the backslash itself must be duplicated: \\ .

Tables of Concerning Characters

There are various reasons why a character cannot be typed directly into the command.

Interpreted Characters

These characters are frequently part of the CSVfox command syntax. Escaping them is done to allow using them in literal text without interfering with the command itself. Technically, they will be "hidden" from the command interpreter and "revealed" again later.

Sequence Character Why?
\[ [ Start of column or variable identifier
\] ] End of column or variable identifier
\( ( Start of a numeric expression
\) ) End of  a numeric expression
\{ { Start of a text expression
\} } End of a text expression
\< < Operator, see also \L
\> > Operator, see also \G
\\ \ Backslash, used for "escaping"
\/ / Operator, command or arguments list delimiter
\= = Assignment operator, command/argument separator
\* * Operator
\# # Operator
\@ @ Operator, or argument expressions separator
\, , Part of a column list expression
\- - Part of a column list expression
It is generally not necessary to escape each of those characters, only "in ambiguous situations" (with the exception of any brackets, which should always be escaped!). But if you see an error message or unwanted effects on the command line, just consider to look for them and use the escape sequence instead.

Control Character Sequences

The following sequences can be used in any literal text, on the command line, in conditions, and in expressions. They will be converted to the respective control characters when resolving.

These sequences will NOT be converted
- on file name parameters of the commands +in, -out, %merge, %log, and %params. This is because the sequence might be a regular part of the file path.
- in the Regular Expression pattern in ±regex command and in the right operand of the Regex rule "@" of Conditions.

Control Character Sequences
Sequence  Character Name
\a 0x07 Bell Alert
\b 0x08 Backspace
\t 0x09 Tab
\n 0x0A New Line
\v 0x0B Vertical Tab
\f 0x0C Form Feed 
\r 0x0D Carriage Return
\e 0x1b Escape (27)
\s 0x20 Space Character* 
\P | "Pipe" Character* 
\L < Redirect input*
\G > Redirect output* 


* These characters are not a control character in the strict sense, but can be used on the command line to insert a space, a pipe or a redirect symbol which otherwise would split and invalidate the argument.

Back References

These character sequences occur solely in the ±regex command as the last part of "Original/Pattern/Replacement" argument.
They stand for the matching back references that will be taken from the Original according to the Pattern. For Regular Expressions, see also https://en.wikipedia.org/wiki/Regular_expression

The matched groups will be refenced in the Replacement part by:

\0, \1 - \9 or
$0, $1 - $9.

while \0 or $0 stands for the whole matched expression, and the other ones stand for the matching subgroups in their order.

Extended Character Placeholder

Purpose

In order to allow entering arbitrary literal character values that are not available on the keyboard, a special extended character placeholder is supported.
It looks like this:
{U+XXXXXX}
where X are hexadecimal digits from "0" to "9", from "A"to "F" and from "a" to "f".

Any number of digits between 1 and 6 are allowed, but the maximum allowed hexadecimal number is 10FFFF, i.e. {U+10FFFF}. This corresponds to the ever-maximum Unicode character (which however is generally not in use).
This placeholder uses the Unicode code point numbering.

How to use it

Which characters can be used ultimately depends on the output file encoding. E.g. if this encoding is ASCII, there is no sense in inserting any characters beyond that.

Examples:
The value for a whitespace can be written as {U+000020}, or simply as {U+20}. The lower character "a" is at {U+61}.
The Greek "Sigma" Σ, also known as the summation operator, can be inserted using {U+03A3}.
And a chinese sign for "house" is 屋, which also can be written as {U+5C4B}.

These placeholders will be replaced before any other resolving takes place. So they can be used as part as column names as well as in arbitrary text.
 

Under construction, coming soon