From the Java Library: java.util.StringTokenizer

Section 7.10 From the Java Library: java.util.StringTokenizer

One of the most widespread string-processing tasks is that of breaking up a string into its components, or tokens.

For example, when processing a sentence, you may need to break the sentence into its constituent words, which are considered the sentence tokens. When processing a name-password string, such as “boyd:14irXp”, you may need to break it into a name and a password.

🔗

Tokens are separated from each other by one or more characters which are known as delimiters. Thus, for a sentence, white space, including blank spaces, tabs, and line feeds, serve as the delimiters. For the password example, the colon character serves as a delimiter.

🔗

Figure 7.10.1. The `StringTokenizer` class.
🔗

Java’s java.util.StringTokenizer class is specially designed for breaking strings into their tokens (Figure 7.10.1). When instantiated with a String parameter, a StringTokenizer breaks the string into tokens, using white space as delimiters. For example, if we instantiated a StringTokenizer as in the code

🔗

StringTokenizer sTokenizer
  = new StringTokenizer("This is an English sentence.");

it would break the string into the following tokens, which would be stored internally in the StringTokenizer in the order shown:

🔗

This
is
an
English
sentence.

Note that the period is part of the last token (“sentence.”). This is because punctuation marks are not considered delimiters by default.

🔗

If you wanted to include punctuation symbols as delimiters, you could use the second StringTokenizer() constructor, which takes a second String parameter (Figure 7.10.1). The second parameter specifies a string of those characters that should be used as delimiters. For example, in the instantiation,

🔗

StringTokenizer sTokenizer
    = new StringTokenizer("This is an English sentence.",
                          "\b\t\n,;.!");

various punctuation symbols (periods, commas, and so on) are included among the delimiters. Note that escape sequences (\b\t\n) are used to specify blanks, tabs, and newlines.

🔗

The hasMoreTokens() and nextToken() methods can be used to process a delimited string, one token at a time. The first method returns true as long as more tokens remain; the second gets the next token in the list. For example, here’s a code segment that will break a standard URL string into its constituent parts:

🔗

String url = "http://java.trincoll.edu/~jjj/index.html";
StringTokenizer sTokenizer = new StringTokenizer(url,":/");
while (sTokenizer.hasMoreTokens()) {
    System.out.println(sTokenizer.nextToken());}

This code segment will produce the following output:

🔗

http
java.trincoll.edu
~jjj
index.html

The only delimiters used in this case were the “:” and “/” symbols. And note that nextToken() does not return the empty string between “:” and “/” as a token.

🔗

You have attempted 1 of 1 activities on this page.

🔗

Prev Top Next