±split

Splits a CSV column into two or more columns

Splitting a column into two
Splitting a column into more than two new columns
Using more separators for more fields
Keeping the original column

This command splits a column into two or more columns, by replacing it by the new columns at the same place.

The basic principle is simple:

±split[Current Column]@[New Column],[New Column],...=separator

The new column names can be given as a comma-separated list, and/or as a column name range (see below).
The separator(s) past the Equal sign = control at which places the column will be splitted.

When column names are not used for output (i.e., -colnames has been set to "false"), the number of new columns in round brackets can be given instead. They will then have "empty" column names.

Splitting Modes

There are four modes of splitting, and each one works from left or from right. The splitting mode is defined by the command mode.

Separating columns from left to right:

The column content will be searched for the separator string from start to end.

split, or split/s. Splitting takes place at the first occurence of the separator string.(see below). The left section will be put into the first column, the remainder will be moved to the right. The separator string itself will be lost.
split/l (split left). Ditto, but the separator string will be appended to the left column.
split/r (split right). The separator string will be prepended to the right column.
split/n=n (split at position). Split after the n'th character of the column (i.e., left column width is n).

Separating columns from right to left:

The column content will be searched backwards for the separator string from end to start.

split/-s Splitting takes place at the last occurence of the separator string. The last section will be put into the rightmost column. The remainder will stay at left. The separator string itself will be lost.
split/-l (split left). Ditto, but the separator string will be appended to the left column.
split/-r (split right). The separator string will be prepended to the right column.
split/-n=n (split at position). Split before the n'th-right character of the column (i.e., right column width is n).

The separator

The separator string is defined by the command parameter after the "=". The content behind (or in front of) the separator will be transferred to the new column. If no separator is given, then the " " (whitespace) is assumed.
If the separator is not found in the column contents, the first new column gets the whole contents, and the other new column(s) remain empty.
If the mode is numeric (/n or /-n), there must be a number (or a series of numbers, see below) past the "=". That number is not a separator string, but it denotes the position where to split.

Splitting a column into two

This is most simple. Just name the existing column, the two new columns and the desired separator or position where to split, and the denoted column will be replaced by the two new ones.
If Column names are not used, then just the number of new columns can be specified (in round brackets) instead of their names.

Pattern

±split[CurrentColumn]@[NewColumn],[NewColumn] ±split[CurrentColumn]@[NewColumn],[NewColumn]=separator string ±split/mode[CurrentColumn]@[NewColumn],[NewColumn]=separator string ±split[CurrentColumnNumber]@(2)=separator string

Usage examples:

This small CSV table will be used in the following examples:

fullnames.csv
City	Name	Birth	Driver License
Rome	Andrea Bocelli	1958	y
Los Angeles	Lieutnant Columbo	1927	y
London	Hercule Poirot	1910	-
New York	Theodoros Kojak	1922	y

The following examples show how to split a column in various modes.
We will use the column Name for this.

csvfox fullnames.csv -split[Name]@[First_Name],[Last_Name] firstlastnames.csv

The separation takes place at the first whitespace (because this is the default split separator). That whitespace is lost.
The resulting CSV table is written to firstlastnames.csv.

firstlastnames.csv
City	First_Name	Last_Name	Birth	Driver License
Rome	Andrea	Bocelli	1958	y
Los Angeles	Lieutnant	Columbo	1927	y
London	Hercule	Poirot	1910	-
New York	Theodoros	Kojak	1922	y

-split/l[Name]@[Left],[Right]=o

The Name field is separated into two new columns at the first occurrence of the character "o". The "o" is appended to the left column.

City	Left	Right	Birth	Driver License
Rome	Andrea Bo	celli	1958	y
Los Angeles	Lieutnant Co	lumbo	1927	y
London	Hercule Po	irot	1910	-
New York	Theo	doros Kojak	1922	y

-split/r[Name]@[Left],[Right]=o

Again, the Name field is separated into two new columns at the first occurrence of the character "o". The "o" is prepended to the right column.

City	Left	Right	Birth	Driver License
Rome	Andrea B	ocelli	1958	y
Los Angeles	Lieutnant C	olumbo	1927	y
London	Hercule P	oirot	1910	-
New York	The	odoros Kojak	1922	y

-split/n[Name]@[Left\sPart],[Right\sPart]=5

The Name field is separated after the fifth character (i.e., the first column will be 5 characters wide). The new column names shall be "Left Part" and "Right Part".
Hint: For the "\s" as a placeholder for whitespace, see here: Character Escaping and Placeholders.
The placeholder might be required for using on the command line, but not inside a %job, %j file. You can use a plain whitespace there.

City	Left Part	Right Part	Birth	Driver License
Rome	Andre	a Bocelli	1958	y
Los Angeles	Lieut	nant Columbo	1927	y
London	Hercu	le Poirot	1910	-
New York	Theod	oros Kojak	1922	y

-split/-n[Name]@[Left],[Right]=5

A negative "n" means counting the position from the right end: The right column will consist of the last five characters of the Name column, the left column will get the (left-side) remainder.

City	Left	Right	Birth	Driver License
Rome	Andrea Bo	celli	1958	y
Los Angeles	Lieutnant Co	lumbo	1927	y
London	Hercule P	oirot	1910	-
New York	Theodoros	Kojak	1922	y

Splitting a column into more than two new columns

Description

This works exactly like before. You just provide even more new column names!
But because here more splitting takes place, there is the option to handle each splitting occurance differently (see below).

Pattern

±split[CurrentColumn]@[NewColumn],[NewColumn]-[NewColumn] ... ±split[CurrentColumn]@[NewColumn],[NewColumn]-[NewColumn]=separator ±split/mode[CurrentColumn]@[NewColumn],[NewColumn]-[NewColumn]=separator,separator,separator ...

Usage Examples:

Here is a product table "products.csv" that contains a nasty category path, which must be splitted into single category columns for further processing.

Products.csv
Name	ProductID	Category Path	Price
Green Fashion Jacket	JACKET1	Women : Outdoor : Jackets : Transitional	220 $
Black Rain Coat	RAINCOAT1	Women : Outdoor : Coats : Rain	200 $
Rubber Flubber Rainboots	W4589356	Women : Shoes : Boots	100 $
Dragon Style Men's Blue Jeans	J4534656	Men : Trousers : Jeans : Stonewashed : Used	100 $

The separator string is apparently " : " (whitespace, colon, whitespace).
So let's do it! CSVfox must auto-generate five new category levels for us.
The first one shall be named "Gender", then four sublevels named "Level2" to "Level5" must follow.
Hint: When using the hyphen "-" instead of a comma-separated list of desired column names, the ±split command can generate simple sequences of numbers or letters automatically. We use this feature for "Level" numbering.

-split[Category\sPath]@[Gender],[Level2]-[Level5]=\s:\s

This is the result:

ProductsWithCategories.csv
Name	ProductID	Gender	Level2	Level3	Level4	Level5	Price
Green Fashion Jacket	JACKET1	Women	Outdoor	Jackets	Transitional		220 $
Black Rain Coat	RAINCOAT1	Women	Outdoor	Coats	Rain		200 $
Rubber Flubber Rainboots	W4589356	Women	Shoes	Boots			100 $
Dragon Style Men's Blue Jeans	J4534656	Men	Trousers	Jeans	Stonewashed	Used	100 $

Each "\s" stands for a single whitespace as part of the separator.
You can leave whitespaces off in the separator string and instead define only the colon as separator, but then the content of all new columns should be shortened with ±trim afterwards. Additionally, a single embedded colon might be misunderstood as a category separator.

And this will happen if the mode "-s" (from right to left) is used:
-split/-s[Category\sPath]@[Gender],[Level2]-[Level5]=\s:\s

ProductsWithCategories.csv
Name	ProductID	Gender	Level2	Level3	Level4	Level5	Price
Green Fashion Jacket	JACKET1		Women	Outdoor	Jackets	Transitional	220 $
Black Rain Coat	RAINCOAT1		Women	Outdoor	Coats	Rain	200 $
Rubber Flubber Rainboots	W4589356			Women	Shoes	Boots	100 $
Dragon Style Men's Blue Jeans	J4534656	Men	Trousers	Jeans	Stonewashed	Used	100 $

This might be useful if the hierarchy is ordered from right to left, and not from left to right as in the example above.
If the "Gender" content shall still always be placed into the first new column, a two-steps approach will be required:
-split/s[Category\sPath]@[Gender],[Levels]=\s:\s
-split/-s[Levels]@[Level2]-[Level5]=\s:\s

ProductsWithCategories.csv
Name	ProductID	Gender	Level2	Level3	Level4	Level5	Price
Green Fashion Jacket	JACKET1	Women		Outdoor	Jackets	Transitional	220 $
Black Rain Coat	RAINCOAT1	Women		Outdoor	Coats	Rain	200 $
Rubber Flubber Rainboots	W4589356	Women			Shoes	Boots	100 $
Dragon Style Men's Blue Jeans	J4534656	Men	Trousers	Jeans	Stonewashed	Used	100 $

Using more separators for more fields

You can optionally define a series of splitting separator strings, one for each new column. In this case, the number of splitting separator strings must exactly correspond to the number of new columns minus 1, as each one will be used for separating one new field. These splitting separator strings must be concatenated by comma ,, like
-split[This]@[New1],[New2],[New3],[New4]=1.,2.,3. or
-split[This]@[New1],[New2],[New3],[New4]=-,-,\s .
When splitting on numeric positions, you can use the relative positions (i.e. lenghts) as separator parameter:
-split/n[This]@[New1],[New2],[New3],[New4]=-5,10,5 .

If the comma iself must be specified as separator, and it is the only separator, just write -split(...)=, .
But if the comma is part of a separator string instead, or included in the separator list, it must be escaped for clarity:
-split(...)=-,\,,- Here we have a list of three separators (hyphen, escaped comma, hyphen), for separating four new columns.

Image some to-do list whose steps should be separated. However, there will never be more than six steps (due to the size of this example).

to-do-list.csv
Name	To-Do
Call	1. call him 2. ask for his name 3. write down the name 4. forget the name
Write	1. write a letter 2. wait for the answer
Visit	1. Go there 2. ring the door bell 3. wait for the door being opened4. say "Good morning!" 5. enter 6. drink coffee

Now we will split the steps. We will scan for the separators which will be different in each step. We do not want to keep them, but we still have to list them as separators.

-split/s[To-Do]@[Step1]-[Step6]=2.\s,3.\s,4.\s,5.\s,6.\s

to-do-list-steps.csv
Name	Step1	Step2	Step3	Step4	Step5	Step6
Call	1. call him	ask for his name	write down the name	forget the name
Write	1. write a letter	wait for the answer
Visit	1. Go there	ring the door bell	wait for the door being opened	say "Good morning!"	enter	drink coffee

You see that we did not get rid of the first separator this way. A further command will be necessary to remove the "1. " from the start of the column "Step1".

Finally, here is an example how to use numeric splitting into more than two columns.
We have some keys together with alphanumeric codes that are 15 digits long, and want to split them into columns of five, six, and four digits.

codes.csv
Key	Code
Alpha	1KGVDEBI6KGGA6Z
Beta	KJGTUJ75HTDH87N
Gamma	KJHT654FNJ7FVPL
Delta	JZVV4390JGRDGBK

This is how we can split the "Code" column
(the remaining part after splitting at place 5, and then at place 6, will go into the third column "Code3"):
-split/n[Code]@[Code1],[Code2],[Code3]=5,6

threepartcodes.csv
Key	Code1	Code2	Code3
Alpha	1KGVD	EBI6KG	GA6Z
Beta	KJGTU	J75HTD	H87N
Gamma	KJHT6	54FNJ7	FVPL
Delta	JZVV4	390JGR	DGBK

But if the csv was already prepared with round brackets, like this:

codes.csv
Key	Code
Alpha	1KGVD(EBI6KG)GA6Z
Beta	KJGTU(J75HTD)H87N
Gamma	KJHT6(54FNJ7)FVPL
Delta	JZVV4(390JGR)DGBK

This is how we would split the "Code" column instead, making use of the brackets:
-split[Code]@[Code1],[Code2],[Code3]=$,$

threepartcodes.csv
Key	Code1	Code2	Code3
Alpha	1KGVD	EBI6KG	GA6Z
Beta	KJGTU	J75HTD	H87N
Gamma	KJHT6	54FNJ7	FVPL
Delta	JZVV4	390JGR	DGBK

Just do not forget to escape all brackets and "special" characters (see Placeholders).

Keeping the original column

If you also want to keep the original column together with the new ones, you should duplicate it before splitting.
-add[ColumnToSplit]@[OriginalColumn]=[OriginalColumn]
-split[ColumnToSplit]@[NewColumn1],[newColumn2]...
This way, the column [OriginalColumn] keeps its full contents, while [ColumnToSplit] will be replaced by the new columns.