CSVfox
Get the job done.
 English

±split

Splits a CSV column into two or more columns

This command splits a column into two or more columns, by replacing it by the new columns at the same place.

The basic principle is simple:

±split[Current Column]@[New Column],[New Column],...=separator
The new column names can be given as a comma-separated list, and/or as a column name range (see below).
The separator(s) past the Equal sign = control at which places the column will be splitted.

When column names are not used for output (i.e., -colnames has been set to "false"), the number of new columns in round brackets can be given instead. They will then have "empty" column names.

Splitting Modes

There are four modes of splitting, and each one works from left or from right. The splitting mode is defined by the command mode.

Separating columns from left to right:

The column content will be searched for the separator string from start to end.
  • split, or split/s. Splitting takes place at the first occurence of the separator string.(see below). The left section will be put into the first column, the remainder will be moved to the right. The separator string itself will be lost.
  • split/l (split left). Ditto, but the separator string will be appended to the left column.
  • split/r (split right). The separator string will be prepended to the right column.
  • split/n=n (split at position). Split after the n'th character of the column (i.e., left column width is n).

Separating columns from right to left:

The column content will be searched backwards for the separator string from end to start.
  • split/-s Splitting takes place at the last occurence of the separator string. The last section will be put into the rightmost column. The remainder will stay at left. The separator string itself will be lost.
  • split/-l (split left). Ditto, but the separator string will be appended to the left column.
  • split/-r (split right). The separator string will be prepended to the right column.
  • split/-n=n (split at position). Split before the n'th-right character of the column (i.e., right column width is n).

The separator

The separator string is defined by the command parameter after the "=". The content behind (or in front of) the separator will be transferred to the new column. If no separator is given, then the " " (whitespace) is assumed.
If the separator is not found in the column contents, the first new column gets the whole contents, and the other new column(s) remain empty.
If the mode is numeric (/n or /-n), there must be a number (or a series of numbers, see below) past the "=". That number is not a separator string, but it denotes the position where to split.

Splitting a column into two

This is most simple. Just name the existing column, the two new columns and the desired separator or position where to split, and the denoted column will be replaced by the two new ones.
If Column names are not used, then just the number of new columns can be specified (in round brackets) instead of their names.

Pattern

±split[CurrentColumn]@[NewColumn],[NewColumn] ±split[CurrentColumn]@[NewColumn],[NewColumn]=separator string ±split/mode[CurrentColumn]@[NewColumn],[NewColumn]=separator string ±split[CurrentColumnNumber]@(2)=separator string

Usage examples:

This small CSV table will be used in the following examples:
fullnames.csv
CityNameBirthDriver License
RomeAndrea Bocelli1958y
Los AngelesLieutnant Columbo1927y
LondonHercule Poirot1910-
New YorkTheodoros Kojak1922y
The following examples show how to split a column in various modes.
We will use the column Name for this.
csvfox fullnames.csv -split[Name]@[First_Name],[Last_Name] firstlastnames.csv
The separation takes place at the first whitespace (because this is the default split separator). That whitespace is lost.
The resulting CSV table is written to firstlastnames.csv.
firstlastnames.csv
CityFirst_NameLast_NameBirthDriver License
RomeAndreaBocelli1958y
Los AngelesLieutnantColumbo1927y
LondonHerculePoirot1910-
New YorkTheodorosKojak1922y
-split/l[Name]@[Left],[Right]=o
The Name field is separated into two new columns at the first occurrence of the character "o". The "o" is appended to the left column.
CityLeftRightBirthDriver License
RomeAndrea Bocelli1958y
Los AngelesLieutnant Columbo1927y
LondonHercule Poirot1910-
New YorkTheodoros Kojak1922y
-split/r[Name]@[Left],[Right]=o
Again, the Name field is separated into two new columns at the first occurrence of the character "o". The "o" is prepended to the right column.
CityLeftRightBirthDriver License
RomeAndrea Bocelli1958y
Los AngelesLieutnant Columbo1927y
LondonHercule Poirot1910-
New YorkTheodoros Kojak1922y
-split/n[Name]@[Left\sPart],[Right\sPart]=5
The Name field is separated after the fifth character (i.e., the first column will be 5 characters wide). The new column names shall be "Left Part" and "Right Part".
Hint: For the "\s" as a placeholder for whitespace, see here: Character Escaping and Placeholders.
The placeholder might be required for using on the command line, but not inside a %job, %j file. You can use a plain whitespace there.
CityLeft PartRight PartBirthDriver License
RomeAndrea Bocelli1958y
Los AngelesLieutnant Columbo1927y
LondonHercule Poirot1910-
New YorkTheodoros Kojak1922y
-split/-n[Name]@[Left],[Right]=5
A negative "n" means counting the position from the right end: The right column will consist of the last five characters of the Name column, the left column will get the (left-side) remainder.
CityLeftRightBirthDriver License
RomeAndrea Bocelli1958y
Los AngelesLieutnant Columbo1927y
LondonHercule Poirot1910-
New YorkTheodoros Kojak1922y

Splitting a column into more than two new columns

Description

This works exactly like before. You just provide even more new column names!
But because here more splitting takes place, there is the option to handle each splitting occurance differently (see below).

Pattern

±split[CurrentColumn]@[NewColumn],[NewColumn]-[NewColumn] ... ±split[CurrentColumn]@[NewColumn],[NewColumn]-[NewColumn]=separator ±split/mode[CurrentColumn]@[NewColumn],[NewColumn]-[NewColumn]=separator,separator,separator ...

Usage Examples:

Here is a product table "products.csv" that contains a nasty category path, which must be splitted into single category columns for further processing.

Products.csv
NameProductIDCategory PathPrice
Green Fashion JacketJACKET1Women : Outdoor : Jackets : Transitional220 $
Black Rain CoatRAINCOAT1Women : Outdoor : Coats : Rain200 $
Rubber Flubber RainbootsW4589356Women : Shoes : Boots100 $
Dragon Style Men's Blue JeansJ4534656Men : Trousers : Jeans : Stonewashed : Used100 $

The separator string is apparently " : " (whitespace, colon, whitespace).
So let's do it! CSVfox must auto-generate five new category levels for us.
The first one shall be named "Gender", then four sublevels named "Level2" to "Level5" must follow.
Hint: When using the hyphen "-" instead of a comma-separated list of desired column names, the ±split command can generate simple sequences of numbers or letters automatically. We use this feature for "Level" numbering.

-split[Category\sPath]@[Gender],[Level2]-[Level5]=\s:\s

This is the result:
ProductsWithCategories.csv
NameProductIDGenderLevel2Level3Level4Level5Price
Green Fashion JacketJACKET1WomenOutdoorJacketsTransitional220 $
Black Rain CoatRAINCOAT1WomenOutdoorCoatsRain200 $
Rubber Flubber RainbootsW4589356WomenShoesBoots100 $
Dragon Style Men's Blue JeansJ4534656MenTrousersJeansStonewashedUsed100 $

Each "\s" stands for a single whitespace as part of the separator.
You can leave whitespaces off in the separator string and instead define only the colon as separator, but then the content of all new columns should be shortened with ±trim afterwards. Additionally, a single embedded colon might be misunderstood as a category separator.

And this will happen if the mode "-s" (from right to left) is used:
-split/-s[Category\sPath]@[Gender],[Level2]-[Level5]=\s:\s

ProductsWithCategories.csv
NameProductIDGenderLevel2Level3Level4Level5Price
Green Fashion JacketJACKET1WomenOutdoorJacketsTransitional220 $
Black Rain CoatRAINCOAT1WomenOutdoorCoatsRain200 $
Rubber Flubber RainbootsW4589356WomenShoesBoots100 $
Dragon Style Men's Blue JeansJ4534656MenTrousersJeansStonewashedUsed100 $

This might be useful if the hierarchy is ordered from right to left, and not from left to right as in the example above.
If the "Gender" content shall still always be placed into the first new column, a two-steps approach will be required:
-split/s[Category\sPath]@[Gender],[Levels]=\s:\s
-split/-s[Levels]@[Level2]-[Level5]=\s:\s

ProductsWithCategories.csv
NameProductIDGenderLevel2Level3Level4Level5Price
Green Fashion JacketJACKET1WomenOutdoorJacketsTransitional220 $
Black Rain CoatRAINCOAT1WomenOutdoorCoatsRain200 $
Rubber Flubber RainbootsW4589356WomenShoesBoots100 $
Dragon Style Men's Blue JeansJ4534656MenTrousersJeansStonewashedUsed100 $

Using more separators for more fields

You can optionally define a series of splitting separator strings, one for each new column. In this case, the number of splitting separator strings must exactly correspond to the number of new columns minus 1, as each one will be used for separating one new field. These splitting separator strings must be concatenated by comma ,, like
-split[This]@[New1],[New2],[New3],[New4]=1.,2.,3. or
-split[This]@[New1],[New2],[New3],[New4]=-,-,\s .
When splitting on numeric positions, you can use the relative positions (i.e. lenghts) as separator parameter:
-split/n[This]@[New1],[New2],[New3],[New4]=-5,10,5  .

If the comma iself must be specified as separator, and it is the only separator, just write -split(...)=, .
But if the comma is part of a separator string instead, or included in the separator list, it must be escaped for clarity:
-split(...)=-,\,,- Here we have a list of three separators (hyphen, escaped comma, hyphen), for separating four new columns.


Image some to-do list whose steps should be separated. However, there will never be more than six steps (due to the size of this example).
to-do-list.csv
NameTo-Do
Call1. call him 2. ask for his name 3. write down the name 4. forget the name
Write1. write a letter 2. wait for the answer
Visit1. Go there 2. ring the door bell 3. wait for the door being opened4. say "Good morning!" 5. enter 6. drink coffee

Now we will split the steps. We will scan for the separators which will be different in each step. We do not want to keep them, but we still have to list them as separators.

-split/s[To-Do]@[Step1]-[Step6]=2.\s,3.\s,4.\s,5.\s,6.\s
to-do-list-steps.csv
NameStep1Step2Step3Step4Step5Step6
Call1. call him ask for his name write down the name forget the name
Write1. write a letter wait for the answer
Visit1. Go there ring the door bell wait for the door being openedsay "Good morning!" enter drink coffee

You see that we did not get rid of the first separator this way. A further command will be necessary to remove the "1. " from the start of the column "Step1".

Finally, here is an example how to use numeric splitting into more than two columns.
We have some keys together with alphanumeric codes that are 15 digits long, and want to split them into columns of five, six, and four digits.
codes.csv
KeyCode
Alpha1KGVDEBI6KGGA6Z
BetaKJGTUJ75HTDH87N
GammaKJHT654FNJ7FVPL
DeltaJZVV4390JGRDGBK

This is how we can split the "Code" column
(the remaining part after splitting at place 5, and then at place 6, will go into the third column "Code3"):
-split/n[Code]@[Code1],[Code2],[Code3]=5,6

threepartcodes.csv
KeyCode1Code2Code3
Alpha1KGVDEBI6KGGA6Z
BetaKJGTUJ75HTDH87N
GammaKJHT654FNJ7FVPL
DeltaJZVV4390JGRDGBK

But if the csv was already prepared with round brackets, like this:

codes.csv
KeyCode
Alpha1KGVD(EBI6KG)GA6Z
BetaKJGTU(J75HTD)H87N
GammaKJHT6(54FNJ7)FVPL
DeltaJZVV4(390JGR)DGBK

This is how we would split the "Code" column instead, making use of the brackets:
-split[Code]@[Code1],[Code2],[Code3]=\(,\)
threepartcodes.csv
KeyCode1Code2Code3
Alpha1KGVDEBI6KGGA6Z
BetaKJGTUJ75HTDH87N
GammaKJHT654FNJ7FVPL
DeltaJZVV4390JGRDGBK

Just do not forget to escape all brackets and "special" characters (see Placeholders).

Keeping the original column

If you also want to keep the original column together with the new ones, you should duplicate it before splitting.
-add[ColumnToSplit]@[OriginalColumn]=[OriginalColumn]
-split[ColumnToSplit]@[NewColumn1],[newColumn2]...
This way, the column [OriginalColumn] keeps its full contents, while [ColumnToSplit] will be replaced by the new columns.

Under construction, coming soon