RSS

Never Use StringTokenizer

08 Feb

You should Never Use StringTokenizer.  That is a strong statement.  It is a bit of an exaggeration.  But in my experience, every case where StringTokenizer has been used that I have seen, has been an abuse of the original concept, and ended up in larger, more complex, and harder to maintain code.

The purpose of a class should be to simplify code, making it easier to use.  The StringTokenizer pattern appears at first glance to be a very useful construct.  It is kind of like a miniature parser that you can use to unpack strings that have been put together.  It gives you back a section at a time.  If the string is just a comma delimited values, and the individual string values never contain commas themselves, then StringTokenizer works fairly well.  But for anything more complicated, it is more difficult to maintain the current state, and to work out of error conditions, that StringTokenizer gets into the way more than anything else.

1      Example With String Tokenizer

Below is a real example that attempts to parse a file that contains name/value pairs.  Note how the logic handles the various error conditions.  The next section contains the same functionality without using StringTokenizer.

/**
*      Loads and reads the errorMessage from the specified file location.
*      @param file- the location of the error message file.
*/
public static void LoadErrMessage(String file) {
errorTable = new Hashtable();
try{
BufferedReader in =
new BufferedReader(new FileReader(file));
String line;
while((line = in.readLine())!= null){
if(line.length() > 0){
String errorCode = getErrorCode(line);
String errorMessage = getMessage(line);
if(errorCode != null && errorMessage != null)
errorTable.put(errorCode, errorMessage);
}
}
in.close();

}
catch (Exception e){
dLog.prn(DebugLog.ERR, "Exception in Reading file: " + e);
}

}

/**
*      Returns the errorCode of the specified errorMessage.
*
*      @param line - the input error message.
*      @return       the corresponding error code.
*/
public static String getErrorCode(String line){
StringTokenizer st = new StringTokenizer(line, ",");
Integer errorCode = null;
String errorCodeString = null;
if(st.hasMoreElements()){
errorCodeString = (String)st.nextElement();
errorCodeString.trim();
}

if(errorCodeString.startsWith("#")){
errorCodeString = errorCodeString.substring(1);
try{
errorCode = new Integer(errorCodeString);
}
catch(Exception e){
System.err.println("Exception Converting (" + errorCodeString
+ ") to Integer " +  e);
return null;
}
}
return errorCodeString;

}

public static String getMessage(String errorCode,String line){
return getMessage(line);
}

public static String getMessage(String line){
StringTokenizer st = new StringTokenizer(line, ",");
String errorMessage = "";
try{
st.nextElement();
}
catch(Exception e){
}

while(st.hasMoreElements()){
errorMessage += (String)st.nextElement();
}
errorMessage.trim();
if(errorMessage != "")
{
errorMessage.trim();
return errorMessage;
}
else
return null;
}

2      Example Without StringTokenizer

Here is the same logic implemented by just using simple String scan functions.  It is far easier to read, shorter, easier to debug, and maintain.  It runs faster too.

/**
*      Loads and reads the `errorMessage` from the specified file location.
*      @param file- the location of the error message file.
*/
public static void LoadErrMessage(String file){
errorTable = new Hashtable();
try{
BufferedReader in =
new BufferedReader(new FileReader(file));
String line;
while((line = in.readLine())!= null) {
if(line.length() == 0) {
continue;
}

// find the comma that separates the number from the msg
int commaPos = line.indexOf(',');
if (commaPos <= 0 || commaPos+1 >= line.length()) {
continue;
}

String errorCode = line.substring(0,commaPos).trim();
String errorMessage = line.substring(commaPos+1).trim();
errorTable.put(errorCode, errorMessage);
}
in.close();
}
catch (Exception e){
dLog.prn(DebugLog.ERR, "Exception in Reading file: " + e);
}
}

While the first example could be written better, it is nevertheless the case that the use of StringTokenizer could NOT be made as short and easy to read as this.  I have found this to be true in general.  The most important thing to take away from this:

Do not blindly use StringTokenizer assuming that it is the “right” way to do something just be it is Java and because it is O-O.  It is very easy to abuse StringTokenizer.  Assume it is a bad idea unless you know that it is the right tool to use.

3      Another case

Here is another actual live code I found in a product:

StringTokenizer st = new StringTokenizer(fullClassName, delimiter);
String className = null;
while(st.hasMoreTokens()) {
className = st.nextToken();
}

This code is apparently trying to take apart a string with slashes in it, find the part of the string after the last slash.  The string is a path expression, and the last is the file name (also the class name).  It does so by creating a string tokenizer, getting every piece of the string in successions just so that when the loop is done, you will have a pointer to the last segment.

A far better approach is to simply find the last delimiter, and take the part of the string after it.  This code might look like this:

int slashpos = fullClassName.lastIndexOf(delimiter);
String className = fullClassName.mid(slashpos +1);

It just goes to show how inventive people can be in order to come up with ways to use StringTokenizer incorrectly.

4      A case where StringTokenizer is properly used

Another use of StringTokenizer.

StringTokenizer st = new StringTokenizer(decodedUserPass, ":");
int numTokens = st.countTokens();
if (numTokens != 2) {
throw new Exception("Invalid username and password in header");
}
String userName = st.nextToken();
String passwd = st.nextToken();

This is correctly written in the sense that it is efficiently written and there is no bad coding practice.  But, how much work is it to write without string tokenizer?

int pos = decodedUserPass.indexOf(":");
if (pos < 0) {
throw new Exception("Invalid username and password in header");
}
String userName = decodedUserPass.substring(0,pos);
String passwd = decodedUserPass.substring(0,pos+1);

There you have it, even in a case that is “perfect” for StringTokenizer, it takes one fewer line to simply search for the colon.  Note also that “countTokens” must parse the complete string in order to count the tokens, then it parses it again when you get the tokens out.  You make two passes through the string, when a simple check for the colon is simpler.  It takes less memory, and leaves fewer objects to be cleaned up (although this is negligible).

5 Burnt Again

Even though I wrote this years ago, developers keep on using StringTokenizer.  In this case I had a program crashing.  Do you know why?  StringTokenizer does not behave well when a null is passed to it.  It throws a NPE.  Once again I am cleaning up this code.

Advertisements
 
Leave a comment

Posted by on February 8, 2014 in Coding

 

Tags:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: