6.10. Glossary¶
6.10.1. Definitions¶
Aggregation: The compiling of information from databases with the intent to prepare combined datasets for data processing.
API: Stands for an application programming interface, a software intermediary that allows applications to communicate with each other.
BeautifulSoup: A Python package that is used for parsing HTML documents.
Boolean Value: A data type with one of two possible values, either true or false.
CSV File: CSV stands for “comma-separated value,” and this format allows us to share data files in a simple text format.
Cascading Style Sheets (CSS): A language used for adding styles to web documents.
DataFrame: A commonly used pandas object which is 2-dimensional with columns of different types.
Dictionary: A data structure that is used for storing data. A dictionary has a set of keys and each of which is associated with a value.
Histogram: A diagram that uses rectangles to represent the distribution of numerical data.
HTML: Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser.
JSON: JavaScript Object Notation is a syntax that uses human-readable text to store and transmit data objects.
Mean: A data set, the arithmetic mean is the sum of the values divided by the number of values.
Pandas: A library that is written for Python used for data analysis.
Pivot Method: Pivot(index, columns, values) method produces a pivot table based on columns of the DataFrame. Uses unique values from index/columns and fills with values.
Scatter Plot: A diagram that uses the values of two variables and is plotted along two axes (x-axis and y-axis), the pattern of the points reveals correlations present in the data.
Web Scrap (Screen Scrape): A technique of extracting large amounts of data from a website and saving it to a local file in your computer or a database in a spreadsheet format.
6.10.2. Keywords¶
geoshape
Altair provides geoshape mark to visualize geographic data.
mark_geoshape
Sets the chart mark to geoshape.
pivot_table
A table that summarizes the data of a more extensive table. The idea behind a pivot table is to take the unique values from some columns and make them the titles of a bunch of columns while summarizing the data for those columns from several rows.
The following format is used to create a pivote_table
in Pandas, DataFrame.pivot_table(index=’’, columns=’’, values=’’)