Strings in Jython
- A string is a list of characters in order (letter, a number, backslash, quote...)
- Jython recognizes as strings everything that is delimited by quotation marks (" " or ' ')
- String/Array index starts at 0
- We use [ ] to access characters in a string
- Strings can be enclosed with single quotes 'mystring'
- Strings can also be enclosed with double quotes "mystring"
- Strings can also be enclosed in triple quotes when multi-lines are used
- Substring is str[Start Position: Start Position + Length]
Raw strings are used when we want to express strings that would be modified by escape sequence processing. They are very useful when using paths as literals. Raw strings are prefixed by r.
In the following example we want to print the new line character:
Operating with Strings
Although there are many functions and operators for Strings, the most common are:
- Concatenation (+): combines 2+ strings into new string object
- Repeat (*n): generates a string with n characters
- split (): splits a string into a string array using the separator defined. If not separator is defined, whitespace is used
- startswith/endswith: checks if a string starts or ends with a substring
- strip: strip(), lstrip(), rstrip() are methods for removing any character from both ends of a string. If the character to be removed are not specified then whitespace will be removed. Same functionality as TRIM in VB.
- join: takes a list of strings and joins them together with the calling string in between each element
- Test: a string can be tested for truth value (T or F)
- Replacing templates (%): the percent "%" character marks the start of the string to be included
- %s used for strings
- %d used for numbers
- %f used for floating points
You can find more details about working with strings (functions, methods, operators...) in: http://www.jython.org/docs/library/string.html
Array Slice Notation
We can use slice notation to access characters or substrings of a string.
:end represents the first value that is not in the selected slice.
So the difference between end and start is the number of elements selected.
The other key point is that start or end may be a negative number, which means it counts from the end of the array instead of the beginning.
You will have to take care with situations where there are fewer items than you ask for. Jython will not return an error but an empty list. On the other hand, if you just want to access a position by using [ ] then you will get an error if the position does not exist:Example 1: my account description has apostrophe character
Let's say we have the following source file where our Description has apostrophe (single quote):
It seems that our apostrophe is causing the issue. FDMEE thinks that the single quote delimits the data row when it is passed to the import format process and therefore it does not find any amount in the line:
EastSales;LaborCost;akaFDMEE's example;[ICP None];Comma_PDAs;NoCustomer;NoChannel;[None];1000
We can easily test it by removing the single quote to see how data is imported. But we don't want this :-)
We would like to parse the file so the amount is processed and rows are successfully imported.
What are the key points for our solution?
- Replace apostrophe by empty string when processing amounts
- Need to parse any field after the description having the apostrophe
The following script replaces the apostrophe (single quote) by empty string and then return the 9th column which is actually the amount:
We have to create one script for each dimension...and set field number to 1 for all of them.
All data will be imported successfully:
Example 2: all my columns are enclosed in double quotes
There are many situations where files are exported from source systems with the following format:
As comma is the delimiter and amount is using Euro notation (comma as decimal delimiter), all fields are enclosed in double quotes " " in order to have the correct number of fields.
Let's try to import this file and see the results:
There are many solutions...one of them consists in stripping double quotes for each dimension. The following script will do that: