Skip to main content
Logo image

Section 11.2 Data sets

The Pokรฉmon CSV file from the previous section is an example of a data set, a collection of a bunch of pieces of information with the same structure. If youโ€™ve ever used a spreadsheet with a bunch of rows of data organized into columns youโ€™ve dealt with a dataset. For example here is bit of a spreadsheet containing data about students with columns for their first name, last name, email, and GPA.
Figure 11.2.1. A spreadsheet
Spreadsheet programs like Google Sheets or Microsoft Excel store data in tables and provide functions to manipulate and calculate with the data. In this section weโ€™ll see how we can do similar things in a Java program.

Subsection 11.2.1 Working with data

Typically such programs work by looping over the values in the data set and accumulating specific values such as by totaling or counting them. In other words we can use the same algorithms we first learned about back in Sectionย 5.3ย Basic loop algorithms.
Sometimes looking at data in spreadsheet can help us plan an algorithm to answer a question about the data. For example, how could we find the average GPA of the students in the table above? Clearly, we would need to access each GPA value in the table and then add them up and then divide by the number of rows.

Activity 11.2.1.

Put the following pseudo code algorithm steps for calculating the average GPA of the students in the table above. Make sure some of the steps are indented to be inside the loop. Click on Check Me to see if you are right.

Activity 11.2.2.

What if your school wants to print out an honor roll of students?
Write pseudo code for an algorithm that prints out the names of students with a GPA of 3.5 or higher given a table of student data where one column is the GPA. Make sure you use repetition (loop) and selection (if) in your algorithm.

Subsection 11.2.2 Parsing data

In the previous section we saw how we can use methods like Scanner.nextLine or Files.readAllLines to get the contents of a file broken up into individual lines. But if we want to do computations with that data we will usually need to extract the individual parts of each line and maybe convert them into types like int or double to work with. The process of extracting structure from a linear sequence of such as the characters in a String is called parsing.
In this section weโ€™ll look at some relatively simple kinds of parsing, turning the lines in a CSV file like pokemon.csv from the previous section into data we can work with.
If you take a look at the Pokรฉmon CSV file, youโ€™ll notice that each line contains multiple values separated by commas. These values are the attributes of a single Pokรฉmon: its name, type, speed, etc. Often the first line of a CSV file is a header line, consisting of a comma-separated list of names of the attributes represented by each column. For instance the Pokรฉmon CSV file starts with this header line:
Number,Pokemon,Type 1,Type 2,HP,Attack,Defense,Speed,PNG,Description
So our first task, if we want to do something more interesting with the Pokรฉmon data than just counting how many lines are in the file, is to parse each line into the individual attribute values.
The String class provides a useful method split(String delimiter) that splits a string into an array of String values based on a specified delimiter which is a character like a comma or a space that separates the units of data. This method returns a String[] where each element in the array contains one substring from the original string.
String sentence = "A quick brown fox jumps";
// Split the sentence into words along spaces to create:
//  words = {"A", "quick", "brown", "fox", "jumps"}
String[] words = sentence.split(" ");
Here is an example of how to use the split method to split a line from the Pokรฉmon csv file into individual attributes. Looking at header in the first line of the file, we can see that the 0th element of the data array is the Pokรฉmonโ€™s number, element 1 is the name, etc. We only need to save the data that we want to use. In this case, we want to save the name, type1, and imageFile.
// Split the line of data into an array of Strings
String[] data = line.split(",");
// Identify the data
// data: Number,Name,Type1,Type2,HP,Attack,Defense,Speed,PNG,Description
String name = data[1];
String type1 = data[2];
String imageFile = data[8];
Try the exercise below to display Pokรฉmon images using the split method to extract names and URLs saved in the file.

Activity 11.2.3.

This program reads in some of the data from the Pokรฉmon file into a String array of lines. Complete the randomPokemon method to print out a random pokemon name and its image using the split method. Run the program multiple times to see different Pokรฉmon names and images.

Subsection 11.2.3 Parsing strings to numbers

For values like a Pokรฉmonโ€™s name, using String is fine. But the attributes, HP, Attack, Defense, or Speed are numbers so if we want to answer questions about them, we will need to convert the String values we get from split into int or double values we can do math with.
The Integer and Double wrapper classes we learned about in Sectionย 10.2ย Wrapper classes also have some useful static methods for converting String values to int and double values. They are Integer.parseInt and Double.parseDouble.
  • static int parseInt(String s) returns the String argument as a signed int.
  • static double parseDouble(String s) returns the String argument as a signed double.
These methods are also listed on the Java Quick Reference Sheet provided during the AP exam. Hereโ€™s an example of using these methods:

Activity 11.2.4.

Run the code below to see useful parse methods in the Integer and Double wrapper classes.

Subsection 11.2.4 Coding Challenge: Pokรฉmon speed

In the last lesson, we read in a file of Pokรฉmon data. In this exercise, we will read in the file and calculate the average Pokรฉmon speed and find the Pokรฉmon with the highest speed. The speed is the 8th column in the file; when a row of data is split into an array with the index starting at 0, the speed will have the 7th index. We will use the Integer.parseInt method to convert the speed from a string to an integer so we can compare the speeds.

Project 11.2.5.

This program reads in each line from the pokemon file into a String array of lines. Complete the findMaxSpeed and findAverageSpeed methods below.

Subsection 11.2.5 Representing CSV data with objects

While we can always get at the data from each row of a CSV file by splitting individual lines into String arrays and then parsing the individual values as needed, itโ€™s a bit cumbersome to keep track of which indexes go with which attributes and to read code that uses those indexes everywhere.
We can make our code a lot easier to understand if we define a class to represent the set of attributes from a single line. In this case that would be a Pokemon class where each instance represents a single Pokรฉmon.
Then we can load our CSV file into an ArrayList of Pokemon objects by parsing each line (except the header) into the attributes for one Pokรฉmon and using them to construct a Pokemon object. Then in the rest of our program we can use getters defined in the Pokemon class to access the attributes of different Pokรฉmon without having to remember which index to use.
To use this approach we must first create a Pokemon class with instance variables for the attributes we care about, and a constructor that initializes these variables. Assuming that we have already written the Pokemon class and constructor, the following code creates a Pokemon object from the data using its constructor and saves it into an array of Pokemon objects.
// Create a list to hold Pokemon objects
ArrayList<Pokemon> pokemon = new ArrayList<>();

// Skip the header line
if (scan.hasNextLine()) scan.nextLine();

while (scan.hasNextLine()) {
    String line = scan.nextLine();
    String[] data = line.split(",");

    // Create a Pokemon object from the data and add to list.
    pokemon.add(new Pokemon(data[1], data[2], Integer.parseInt(data[7]), data[8]));
}
Note that this code skips the first line in the file since it contains column headers, not data about a Pokรฉmon. Usually CSV files have a header line but not always; we need to know whether or not they do to process them correctly.

Subsection 11.2.6 Coding Challenge: list of Pokรฉmon from input file

Letโ€™s end with a challenge that combines all the skills you have learned so far. You could work in pairs for this challenge.

Project 11.2.6.

Create a class Pokemon that has at least three attributes that can be found in the Pokรฉmon file, including its name, type1, and imagefile, and any other attributes from the file that you would like. Write a constructor and getters for these attributes.
Then, read in the data from the pokemon file, skipping the header row, split each line, and save the data in an array of Pokemon objects.
Write a findType method that returns to the type of a Pokemon given its name as an argument. It should loop through the array to find the right Pokemon object using the getName and getType methods that you will write. It should also display the image for the Pokรฉmon.

Subsection 11.2.7 Optional challenge with a dataset

If your class has time, you can try the following open-ended challenge that uses a dataset of your choice. You could work in pairs for this challenge. Choose a dataset from the files pokemon.csv, WorldIndicators2000.csv, or StateData2020-CDC-Census.csv to read into an array of objects. The activecode window below is set up to use these files. Look at the columns of the dataset you have chosen at the bottom of this web page to decide on the name and at least 3 attributes for your class. Each row in the data file will be an object of your class that you will add to the array. If you find another data CSV file online that you would like to use, you can read from a URL instead of a file in Java using the java.net package following the directions here https://docs.oracle.com/javase/tutorial/networking/urls/readingURL.html.
After you have chosen an input file, use the Pokรฉmon exercise in the section above as a guide to:
  1. Design a class for the input file that you have chosen. Choose at least 3 attributes that can be found in the file for your class. Write a constructor that takes in these attributes as parameters and saves them into instance variables. You may need to add some getters and a toString method as well.
  2. Declare an array of your class type.
  3. Read in the data from the file.
  4. Inside a loop, split each line into its attributes and create an object for your class using its constructor. Add the object to the array.
  5. Do something interesting with the data using a loop, for example you could find the maximum or minimum value of an attribute or print out all the objects that have the same attribute value.

Project 11.2.7.

Input File Challenge: Design the class for your input file that has at least 3 attributes that can be found in the file. Then, read in the data from the file, split each line, and save the data in an array of objects. Finally, do something interesting with the data using a loop, for example you could find the object with the max or min attribute value or print out all the objects of a certain attribute value. You can use the files pokemon.csv, WorldIndicators2000.csv, or StateData2020-CDC-Census.csv.

Subsection 11.2.8 Summary

  • (AP 4.2.A.1) A data set is a collection of specific pieces of information or data.
  • (AP 4.2.A.2) Data sets can be manipulated and analyzed to solve a problem or answer a question. When analyzing data sets, values within the set are accessed and utilized one at a time and then processed according to the desired outcome.
  • (AP 4.2.A.3) Data can be represented in a diagram by using a chart or table. This visual can be used to plan the algorithm that will be used to manipulate the data.
  • (AP 4.6.A.1) A file is storage for data that persists when the program is not running. The data in a file can be retrieved during program execution.
  • (AP 4.6.A.2) A file can be connected to the program using the File and Scanner classes.
  • (AP 4.6.A.3) A file can be opened by creating a File object, using the name of the file as the argument of the constructor. File(String str) is the File constructor that accepts a String file name to open for reading, where str is the pathname for the file.
  • (AP 4.6.A.4) When using the File class, it is required to indicate what to do if the file with the provided name cannot be opened. One way to accomplish this is to add throws IOException to the header of the method that uses the file. If the file name is invalid, the program will terminate.
  • (AP 4.6.A.5) The File and IOException classes are part of the java.io package. An import statement must be used to make these classes available for use in the program.
  • (AP 4.6.A.6) The following Scanner methods and constructorโ€”including what they do and when they are usedโ€”are part of the Java Quick Reference (p. 2) provided during the AP CSA exam:
    • Scanner(File f) the Scanner constructor that accepts a File for reading.
    • int nextInt() returns the next int read from the file or input source. If the next int does not exist, it will result in an InputMismatchException. Note that this method does not read the end of the line, so the next call to nextLine() will return the rest of the line which will be empty.
    • double nextDouble() returns the next double read from the file or input source. If the next double does not exist, it will result in an InputMismatchException.
    • boolean nextBoolean() returns the next Boolean read from the file or input source. If the next boolean does not exist, it will result in an InputMismatchException.
    • String nextLine() returns the next line of text up until the end of the line as a String read from the file or input source; returns the empty string if called immediately after another Scanner method like nextInt that is reading from the file or input source;returns null if there is no next line.
    • String next() returns the next String up until a white space read from the file or input source.
    • boolean hasNext() returns true if there is a next item to read in the file or input source; false otherwise.
    • void close() closes the input stream.
  • (AP 4.6.A.7) Using nextLine and the other Scanner methods together on the same input source sometimes requires code to adjust for the methodsโ€™ different ways of handling whitespace.
  • (AP 4.6.A.8) The following additional String methodโ€”including what it does and when it is usedโ€”is part of the Java Quick Reference provided during the AP CSA Exam:
    • String[] split(String del) returns a String array where each element is a substring of this String, which has been split around matches of the given expression del.
    • For example, String[] data = line.split(","); splits a line from a csv file along the commas and saves the substrings into the array data.
  • (AP 4.6.A.9) A while loop can be used to detect if the file still contains elements to read by using the hasNext method as the condition of the loop.
  • (AP 4.6.A.10) A file should be closed when the program is finished using it. The close method from Scanner is called to close the file.
You have attempted of activities on this page.