±split
Splits a CSV column into two or more columns
This command splits a column into two or more columns, by replacing it by the new columns at the same place.
The basic principle is simple:
The separator(s) past the Equal sign
When column names are not used for output (i.e., -colnames has been set to "false"), the number of new columns in round brackets can be given instead. They will then have "empty" column names.
Splitting Modes
There are four modes of splitting, and each one works from left or from right. The splitting mode is defined by the command mode.
Separating columns from left to right:
The column content will be searched for the separator string from start to end.split , orsplit/s . Splitting takes place at the first occurence of the separator string.(see below). The left section will be put into the first column, the remainder will be moved to the right. The separator string itself will be lost.split/l (split left). Ditto, but the separator string will be appended to the left column.split/r (split right). The separator string will be prepended to the right column.split/n=n (split at position). Split after the n'th character of the column (i.e., left column width is n).
Separating columns from right to left:
The column content will be searched backwards for the separator string from end to start.split/-s Splitting takes place at the last occurence of the separator string. The last section will be put into the rightmost column. The remainder will stay at left. The separator string itself will be lost.split/-l (split left). Ditto, but the separator string will be appended to the left column.split/-r (split right). The separator string will be prepended to the right column.split/-n=n (split at position). Split before the n'th-right character of the column (i.e., right column width is n).
The separator
The separator string is defined by the command parameter after the "=". The content behind (or in front of) the separator will be transferred to the new column. If no separator is given, then the " " (whitespace) is assumed.If the separator is not found in the column contents, the first new column gets the whole contents, and the other new column(s) remain empty.
If the mode is numeric (/n or /-n), there must be a number (or a series of numbers, see below) past the "=". That number is not a separator string, but it denotes the position where to split.
Splitting a column into two
This is most simple. Just name the existing column, the two new columns and the desired separator or position where to split, and the denoted column will be replaced by the two new ones.If Column names are not used, then just the number of new columns can be specified (in round brackets) instead of their names.
Pattern
Usage examples:
This small CSV table will be used in the following examples:City | Name | Birth | Driver License |
---|---|---|---|
Rome | Andrea Bocelli | 1958 | y |
Los Angeles | Lieutnant Columbo | 1927 | y |
London | Hercule Poirot | 1910 | - |
New York | Theodoros Kojak | 1922 | y |
We will use the column
- csvfox fullnames.csv -split[Name]@[First_Name],[Last_Name] firstlastnames.csv
- The separation takes place at the first whitespace (because this is the default split separator). That whitespace is lost.
The resulting CSV table is written to firstlastnames.csv.firstlastnames.csv City First_Name Last_Name Birth Driver License Rome Andrea Bocelli 1958 y Los Angeles Lieutnant Columbo 1927 y London Hercule Poirot 1910 - New York Theodoros Kojak 1922 y - -split/l[Name]@[Left],[Right]=o
- The
Name field is separated into two new columns at the first occurrence of the character "o". The "o" is appended to the left column.City Left Right Birth Driver License Rome Andrea Bo celli 1958 y Los Angeles Lieutnant Co lumbo 1927 y London Hercule Po irot 1910 - New York Theo doros Kojak 1922 y - -split/r[Name]@[Left],[Right]=o
- Again, the
Name field is separated into two new columns at the first occurrence of the character "o". The "o" is prepended to the right column.City Left Right Birth Driver License Rome Andrea B ocelli 1958 y Los Angeles Lieutnant C olumbo 1927 y London Hercule P oirot 1910 - New York The odoros Kojak 1922 y - -split/n[Name]@[Left\sPart],[Right\sPart]=5
- The
Name field is separated after the fifth character (i.e., the first column will be 5 characters wide). The new column names shall be "Left Part" and "Right Part".
Hint: For the "\s" as a placeholder for whitespace, see here: Character Escaping and Placeholders.
The placeholder might be required for using on the command line, but not inside a %job, %j file. You can use a plain whitespace there.City Left Part Right Part Birth Driver License Rome Andre a Bocelli 1958 y Los Angeles Lieut nant Columbo 1927 y London Hercu le Poirot 1910 - New York Theod oros Kojak 1922 y - -split/-n[Name]@[Left],[Right]=5
- A negative "n" means counting the position from the right end: The right column will consist of the last five characters of the
Name column, the left column will get the (left-side) remainder.City Left Right Birth Driver License Rome Andrea Bo celli 1958 y Los Angeles Lieutnant Co lumbo 1927 y London Hercule P oirot 1910 - New York Theodoros Kojak 1922 y
Splitting a column into more than two new columns
Description
This works exactly like before. You just provide even more new column names!
But because here more splitting takes place, there is the option to handle each splitting occurance differently (see below).
Pattern
Usage Examples:
Here is a product table "products.csv" that contains a nasty category path, which must be splitted into single category columns for further processing.
Name | ProductID | Category Path | Price |
---|---|---|---|
Green Fashion Jacket | JACKET1 | Women : Outdoor : Jackets : Transitional | 220 $ |
Black Rain Coat | RAINCOAT1 | Women : Outdoor : Coats : Rain | 200 $ |
Rubber Flubber Rainboots | W4589356 | Women : Shoes : Boots | 100 $ |
Dragon Style Men's Blue Jeans | J4534656 | Men : Trousers : Jeans : Stonewashed : Used | 100 $ |
So let's do it! CSVfox must auto-generate five new category levels for us.
The first one shall be named "Gender", then four sublevels named "Level2" to "Level5" must follow.
Hint: When using the hyphen "-" instead of a comma-separated list of desired column names, the ±split command can generate simple sequences of numbers or letters automatically. We use this feature for "Level" numbering.
-split[Category\sPath]@[Gender],[Level2]-[Level5]=\s:\s
This is the result:
Name | ProductID | Gender | Level2 | Level3 | Level4 | Level5 | Price |
---|---|---|---|---|---|---|---|
Green Fashion Jacket | JACKET1 | Women | Outdoor | Jackets | Transitional | 220 $ | |
Black Rain Coat | RAINCOAT1 | Women | Outdoor | Coats | Rain | 200 $ | |
Rubber Flubber Rainboots | W4589356 | Women | Shoes | Boots | 100 $ | ||
Dragon Style Men's Blue Jeans | J4534656 | Men | Trousers | Jeans | Stonewashed | Used | 100 $ |
Each "\s" stands for a single whitespace as part of the separator.
You can leave whitespaces off in the separator string and instead define only the colon as separator, but then the content of all new columns should be shortened with ±trim afterwards. Additionally, a single embedded colon might be misunderstood as a category separator.
And this will happen if the mode "-s" (from right to left) is used:
-split/-s[Category\sPath]@[Gender],[Level2]-[Level5]=\s:\s
Name | ProductID | Gender | Level2 | Level3 | Level4 | Level5 | Price |
---|---|---|---|---|---|---|---|
Green Fashion Jacket | JACKET1 | Women | Outdoor | Jackets | Transitional | 220 $ | |
Black Rain Coat | RAINCOAT1 | Women | Outdoor | Coats | Rain | 200 $ | |
Rubber Flubber Rainboots | W4589356 | Women | Shoes | Boots | 100 $ | ||
Dragon Style Men's Blue Jeans | J4534656 | Men | Trousers | Jeans | Stonewashed | Used | 100 $ |
This might be useful if the hierarchy is ordered from right to left, and not from left to right as in the example above.
If the "Gender" content shall still always be placed into the first new column, a two-steps approach will be required:
-split/s[Category\sPath]@[Gender],[Levels]=\s:\s
-split/-s[Levels]@[Level2]-[Level5]=\s:\s
Name | ProductID | Gender | Level2 | Level3 | Level4 | Level5 | Price |
---|---|---|---|---|---|---|---|
Green Fashion Jacket | JACKET1 | Women | Outdoor | Jackets | Transitional | 220 $ | |
Black Rain Coat | RAINCOAT1 | Women | Outdoor | Coats | Rain | 200 $ | |
Rubber Flubber Rainboots | W4589356 | Women | Shoes | Boots | 100 $ | ||
Dragon Style Men's Blue Jeans | J4534656 | Men | Trousers | Jeans | Stonewashed | Used | 100 $ |
Using more separators for more fields
You can optionally define a series of splitting separator strings, one for each new column. In this case, the number of splitting separator strings must exactly correspond to the number of new columns minus 1, as each one will be used for separating one new field. These splitting separator strings must be concatenated by comma
When splitting on numeric positions, you can use the relative positions (i.e. lenghts) as separator parameter:
If the comma iself must be specified as separator, and it is the only separator, just write
But if the comma is part of a separator string instead, or included in the separator list, it must be escaped for clarity:
Image some to-do list whose steps should be separated. However, there will never be more than six steps (due to the size of this example).
Name | To-Do |
---|---|
Call | 1. call him 2. ask for his name 3. write down the name 4. forget the name |
Write | 1. write a letter 2. wait for the answer |
Visit | 1. Go there 2. ring the door bell 3. wait for the door being opened4. say "Good morning!" 5. enter 6. drink coffee |
Now we will split the steps. We will scan for the separators which will be different in each step. We do not want to keep them, but we still have to list them as separators.
-split/s[To-Do]@[Step1]-[Step6]=2.\s,3.\s,4.\s,5.\s,6.\s
Name | Step1 | Step2 | Step3 | Step4 | Step5 | Step6 |
---|---|---|---|---|---|---|
Call | 1. call him | ask for his name | write down the name | forget the name | ||
Write | 1. write a letter | wait for the answer | ||||
Visit | 1. Go there | ring the door bell | wait for the door being opened | say "Good morning!" | enter | drink coffee |
You see that we did not get rid of the first separator this way. A further command will be necessary to remove the "1. " from the start of the column "Step1".
Finally, here is an example how to use numeric splitting into more than two columns.
We have some keys together with alphanumeric codes that are 15 digits long, and want to split them into columns of five, six, and four digits.
Key | Code |
---|---|
Alpha | 1KGVDEBI6KGGA6Z |
Beta | KJGTUJ75HTDH87N |
Gamma | KJHT654FNJ7FVPL |
Delta | JZVV4390JGRDGBK |
This is how we can split the "Code" column
(the remaining part after splitting at place 5, and then at place 6, will go into the third column "Code3"):
-split/n[Code]@[Code1],[Code2],[Code3]=5,6
Key | Code1 | Code2 | Code3 |
---|---|---|---|
Alpha | 1KGVD | EBI6KG | GA6Z |
Beta | KJGTU | J75HTD | H87N |
Gamma | KJHT6 | 54FNJ7 | FVPL |
Delta | JZVV4 | 390JGR | DGBK |
But if the csv was already prepared with round brackets, like this:
Key | Code |
---|---|
Alpha | 1KGVD(EBI6KG)GA6Z |
Beta | KJGTU(J75HTD)H87N |
Gamma | KJHT6(54FNJ7)FVPL |
Delta | JZVV4(390JGR)DGBK |
This is how we would split the "Code" column instead, making use of the brackets:
-split[Code]@[Code1],[Code2],[Code3]=\(,\)
Key | Code1 | Code2 | Code3 |
---|---|---|---|
Alpha | 1KGVD | EBI6KG | GA6Z |
Beta | KJGTU | J75HTD | H87N |
Gamma | KJHT6 | 54FNJ7 | FVPL |
Delta | JZVV4 | 390JGR | DGBK |
Just do not forget to escape all brackets and "special" characters (see Placeholders).
Keeping the original column
If you also want to keep the original column together with the new ones, you should duplicate it before splitting.
This way, the column