5.7. Glossary

5.7.1. Definitions

Cell: Rectangular boxes containing text or code in a notebook.

Code Cell: A cell in Juypter lab that you can program in. It uses Python3 as its programming language.

Data Frame: Data frames are multidimensional arrays taken from a larger dataset. They are used to implement specific data operations that may not need the entire dataset. (In pandas it is called DataFrame)

Explicit Index: Uses the values (numeric or non-numeric) set as the index. For example, if we set a column or row as the index then we can use values in the row or column as indices in different panda methods.

Implicit Index: Uses the location (numeric) of the indices, similar to the python style of indexing.

Index: An Index is a value that represents a position (address) in the DataFrame or series.

Markdown: Markdown is a lightweight markup language that uses a plain text format which is used in programming to edit and present HTML, XHTML, pdf and other file types. Refer to the relevant appendix for more about Markdown.

Series: A series is an array of related data values that share a connecting factor or property.

Text Cell: A cell in Juypter lab that you can write text in. The text is written in a language called Markdown.

5.7.2. Keywords

import: Import lets programmers use packages, libraries or modules that have already been programmed.

<DataFrame>[<string>]: return the series corresponding to the given column (<string>).

<DataFrame>[<list of strings>]: returns a given set of columns as a DataFrame.

<DataFrame>[<series/list of Boolean>]: If the index in the given list is True then it returns the row from that same index in the DataFrame.

<DataFrame>.loc[ ]: Uses explicit indexing to return a DataFrame containing those indices and the values associated with them.

<DataFrame>.loc[<string1>:<string2>]: This takes in a range of explicit indices and returns a DataFrame containing those indices and the values associated with them.

<DataFrame>.loc[<string>]: Uses an explicit index and return the row(s) for that index value.

<DataFrame>.loc[<list/series of strings>]: Returns a new DataFrame containing the labels given in the list of strings.

<DataFrame>.iloc[ ]: Uses implicit indexing to return a DataFrame containing those indices and the values associated with them.

<DataFrame>.iloc[<index, range of indices>]: This takes in an implicit index (or a range of implicit indices) and returns a DataFrame containing those indices and the values associated with them.

<DataFrame>.set_index [<string)>]: Sets an existing column(s) with the <string> name as the index of the DataFrame.

<DataFrame>.head(<numeric>): Returns the first <numeric> element(s). If no parameter (<numeric>) is set then it will return the first five elements.

<pandas>.DataFrame(<data>): Used to create a DataFrame with the given data.

<pandas>.read_csv(): Used to read a csv file into a DataFrame.

<DataFrame>.set_index(<column>): Gets the values of the given column and sets them as indices. The output will be sorted in accending order based on the new indices.

<pandas>.to_numeric(): Converts what is inside the parenthesis into neumeric values.

<series>.str.startswith(<string>): .str.startswith() (in pandas) checks if a series contains a string(s) that starts with the given prarameter (<string>), and returns a boolean value (True or False).

<data frame>.sort_index(): Sorts the different objects in the DataFrame. By default, the DataFrame is sorted based on the first column in accending order.

You have attempted of activities on this page