# Create a variable savings
=100
savings# Print out savings
print(savings)
100
Use the arrays imported in the first cell to explore the data and practice your skills! - Print out the weight of the first ten baseball players. - What is the median weight of all baseball players in the data? - Print out the names of all players with a height greater than 80 (heights are in inches). - Who is taller on average? Baseball players or soccer players? Keep in mind that baseball heights are stored in inches! - The values in soccer_shooting
are decimals. Convert them to whole numbers (e.g., 0.98 becomes 98). - Do taller players get higher ratings? Calculate the correlation between soccer_ratings
and soccer_heights
to find out! - What is the average rating for attacking players ('A'
)?
# Create a variable savings
=100
savings# Print out savings
print(savings)
100
# Create the variables monthly_savings and num_months
= 10
monthly_savings = 4
num_months
# Multiply monthly_savings and num_months
= monthly_savings * num_months
new_savings
# Add new_savings to your savings
= savings+ new_savings
total_savings
# Print total_savings
print(total_savings)
140
# Create a variable half
= 0.5
half
# Create a variable intro
= "Hello! How are you?"
intro
# Create a variable is_good
= True is_good
= 10
monthly_savings = 12
num_months = "Hello! How are you?"
intro
# Calculate year_savings using monthly_savings and num_months
= monthly_savings * num_months
year_savings
# Print the type of year_savings
print(type(year_savings))
# Assign sum of intro and intro to doubleintro
= intro + intro
doubleintro
# Print out doubleintro
print(doubleintro)
<class 'int'>
Hello! How are you?Hello! How are you?
# Definition of savings and total_savings
= 100
savings = 150
total_savings
# Fix the printout
print("I started with $" + str(savings) + " and now have $" + str(total_savings) + ". Awesome!")
# Definition of pi_string
= "3.1415926"
pi_string
# Convert pi_string into float: pi_float
= float(pi_string) pi_float
I started with $100 and now have $150. Awesome!
As opposed to int, bool etc., a list is a compound data type; you can group values together:
a = "is" b = "nice" my_list = ["my", "list", a, b]
After measuring the height of your family, you decide to collect some information on the house you’re living in. The areas of the different parts of your house are stored in separate variables for now, as shown in the script.
# area variables (in square meters)
= 11.25
hall = 18.0
kit = 20.0
liv = 10.75
bed = 9.50
bath
# Create list areas
= [hall, kit, liv, bed, bath]
areas
# Print areas
print(areas)
[11.25, 18.0, 20.0, 10.75, 9.5]
A list can contain any Python type. Although it’s not really common, a list can also contain a mix of Python types including strings, floats, booleans, etc.
The printout of the previous exercise wasn’t really satisfying. It’s just a list of numbers representing the areas, but you can’t tell which area corresponds to which part of your house.
The code in the editor is the start of a solution. For some of the areas, the name of the corresponding room is already placed in front. Pay attention here! “bathroom” is a string, while bath is a variable that represents the float 9.50 you specified earlier.
-Finish the code that creates the areas list. Build the list so that the list first contains the name of each room as a string and then its area. In other words, add the strings “hallway”, “kitchen” and “bedroom” at the appropriate locations. - Print areas again; is the printout more informative this time?
# area variables (in square meters)
= 11.25
hall = 18.0
kit = 20.0
liv = 10.75
bed = 9.50
bath
# Adapt list areas
= ["hallway", hall, "kitchen", kit, "living room", liv, "bedroom",bed, "bathroom", bath]
areas
# Print areas
print(areas)
['hallway', 11.25, 'kitchen', 18.0, 'living room', 20.0, 'bedroom', 10.75, 'bathroom', 9.5]
As a data scientist, you’ll often be dealing with a lot of data, and it will make sense to group some of this data.
Instead of creating a flat list containing strings and floats, representing the names and areas of the rooms in your house, you can create a list of lists. The script in the editor can already give you an idea.
Don’t get confused here: “hallway” is a string, while hall is a variable that represents the float 11.25 you specified earlier.
# area variables (in square meters)
= 11.25
hall = 18.0
kit = 20.0
liv = 10.75
bed = 9.50
bath
# house information as list of lists
= [["hallway", hall],
house "kitchen", kit],
["living room", liv],
["bedroom", bed],
["bathroom",bath]
[
]
# Print out house
print(house)
# Print out the type of house
print(type(house))
[['hallway', 11.25], ['kitchen', 18.0], ['living room', 20.0], ['bedroom', 10.75], ['bathroom', 9.5]]
<class 'list'>
Subsetting Python lists is a piece of cake. Take the code sample below, which creates a list x and then selects “b” from it. Remember that this is the second element, so it has index 1. You can also use negative indexing.
x = ["a", "b", "c", "d"] x[1] x[-3] # same result!
Remember the areas list from before, containing both strings and floats? Its definition is already in the script. Can you add the correct code to do some Python subsetting?
# Create the areas list
= ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]
areas
# Print out second element from areas
print(areas[1])
# Print out last element from areas
print(areas[-1])
# Print out the area of the living room
print(areas[5])
11.25
9.5
20.0
After you’ve extracted values from a list, you can use them to perform additional calculations. Take this example, where the second and fourth element of a list x are extracted. The strings that result are pasted together using the + operator:
x = ["a", "b", "c", "d"] print(x[1] + x[3])
# Create the areas list
= ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]
areas
# Sum of kitchen and bedroom area: eat_sleep_area
= areas[3] + areas[7]
eat_sleep_area
# Print the variable eat_sleep_area
print(eat_sleep_area)
28.75
Selecting single values from a list is just one part of the story. It’s also possible to slice your list, which means selecting multiple elements from your list. Use the following syntax:
my_list[start:end]
The start index will be included, while the end index is not.
The code sample below shows an example. A list with “b” and “c”, corresponding to indexes 1 and 2, are selected from a list x:
`x = [“a”, “b”, “c”, “d”] x[1:3]’
The elements with index 1 and 2 are included, while the element with index 3 is not.
# Create the areas list
= ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]
areas
# Use slicing to create downstairs
= areas[0:6]
downstairs # Use slicing to create upstairs
= areas[6:10]
upstairs
# Print out downstairs and upstairs
print(downstairs)
print(upstairs)
['hallway', 11.25, 'kitchen', 18.0, 'living room', 20.0]
['bedroom', 10.75, 'bathroom', 9.5]
In the video, Hugo first discussed the syntax where you specify both where to begin and end the slice of your list:
my_list[begin:end]
However, it’s also possible not to specify these indexes. If you don’t specify the begin index, Python figures out that you want to start your slice at the beginning of your list. If you don’t specify the end index, the slice will go all the way to the last element of your list. To experiment with this, try the following commands in the IPython Shell:
x = ["a", "b", "c", "d"] x[:2] x[2:] x[:]
# Create the areas list
= ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]
areas
# Alternative slicing to create downstairs
= areas[:6]
downstairs
# Alternative slicing to create upstairs
= areas[6:] upstairs
You saw before that a Python list can contain practically anything; even other lists! To subset lists of lists, you can use the same technique as before: square brackets. Try out the commands in the following code sample in the IPython Shell:
x = [["a", "b", "c"], ["d", "e", "f"], ["g", "h", "i"]] x[2][0] x[2][:2]
x[2]
results in a list, that you can subset again by adding additional square brackets. What will house[-1][1]
return?
-1][1] house[
9.5
Replacing list elements is pretty easy. Simply subset the list and assign new values to the subset. You can select single elements or you can change entire list slices at once.
Use the IPython Shell to experiment with the commands below. Can you tell what’s happening and why?
x = ["a", "b", "c", "d"] x[1] = "r" x[2:] = ["s", "t"]
For this and the following exercises, you’ll continue working on the areas list that contains the names and areas of different rooms in a house.
# Create the areas list
= ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]
areas
# Correct the bathroom area
-1] = 10.50
areas[
# Change "living room" to "chill zone"
4] = "chill zone" areas[
If you can change elements in a list, you sure want to be able to add elements to it, right? You can use the + operator:
x = ["a", "b", "c", "d"] y = x + ["e", "f"]
You just won the lottery, awesome! You decide to build a poolhouse and a garage. Can you add the information to the areas list?
# Create the areas list and make some changes
= ["hallway", 11.25, "kitchen", 18.0, "chill zone", 20.0,
areas "bedroom", 10.75, "bathroom", 10.50]
# Add poolhouse data to areas, new list is areas_1
= areas + ["poolhouse", 24.5]
areas_1
# Add garage data to areas_1, new list is areas_2
= areas_1 + ["garage", 15.45] areas_2
Finally, you can also remove elements from your list. You can do this with the del statement:
x = ["a", "b", "c", "d"] del(x[1])
Pay attention here: as soon as you remove an element from a list, the indexes of the elements that come after the deleted element all change!
The updated and extended version of areas that you’ve built in the previous exercises is coded below. You can copy and paste this into the IPython Shell to play around with the result.
areas = ["hallway", 11.25, "kitchen", 18.0, "chill zone", 20.0, "bedroom", 10.75, "bathroom", 10.50, "poolhouse", 24.5, "garage", 15.45]
There was a mistake! The amount you won with the lottery is not that big after all and it looks like the poolhouse isn’t going to happen. You decide to remove the corresponding string and float from the areas list.
The ; sign is used to place commands on the same line. The following two code chunks are equivalent:
` Same line command1; command2
Separate lines command1 command2`
Which of the code chunks will do the job for us? del(areas[-4:-2])
At the end of the video, Hugo explained how Python lists work behind the scenes. In this exercise you’ll get some hands-on experience with this.
The Python code in the script already creates a list with the name areas and a copy named areas_copy. Next, the first element in the areas_copy list is changed and the areas list is printed out. If you hit Run Code you’ll see that, although you’ve changed areas_copy, the change also takes effect in the areas list. That’s because areas and areas_copy point to the same list.
If you want to prevent changes in areas_copy from also taking effect in areas, you’ll have to do a more explicit copy of the areas list. You can do this with list() or by using [:].
# Create list areas
= [11.25, 18.0, 20.0, 10.75, 9.50]
areas
# Create areas_copy
= list(areas)
areas_copy
# Change areas_copy
0] = 5.0
areas_copy[
# Print areas
print(areas)
[11.25, 18.0, 20.0, 10.75, 9.5]
Out of the box, Python offers a bunch of built-in functions to make your life as a data scientist easier. You already know two such functions: print() and type(). You’ve also used the functions str(), int(), bool() and float() to switch between data types. These are built-in functions as well.
Calling a function is easy. To get the type of 3.0 and store the output as a new variable, result, you can use the following:
result = type(3.0)
The general recipe for calling functions and saving the result to a variable is thus:
output = function_name(input)
# Create variables var1 and var2
= [1, 2, 3, 4]
var1 = True
var2
# Print out type of var1
print(type(var1))
# Print out length of var1
print(len(var1))
# Convert var2 to an integer: out2
= int(var2) out2
<class 'list'>
4
In the previous exercise, you identified optional arguments by viewing the documentation with help(). You’ll now apply this to change the behavior of the sorted() function.
Have a look at the documentation of sorted() by typing help(sorted) in the IPython Shell.
You’ll see that sorted() takes three arguments: iterable, key, and reverse.
key=None means that if you don’t specify the key argument, it will be None. reverse=False means that if you don’t specify the reverse argument, it will be False, by default.
In this exercise, you’ll only have to specify iterable and reverse, not key. The first input you pass to sorted() will be matched to the iterable argument, but what about the second input? To tell Python you want to specify reverse without changing anything about key, you can use = to assign it a new value:
sorted(____, reverse=____)
Two lists have been created for you. Can you paste them together and sort them in descending order?
Note: For now, we can understand an iterable as being any collection of objects, e.g., a List.
# Create lists first and second
= [11.25, 18.0, 20.0]
first = [10.75, 9.50]
second
# Paste together first and second: full
= first + second
full
# Sort full in descending order: full_sorted
= sorted(full,reverse=True)
full_sorted
# Print out full_sorted
print(full_sorted)
[20.0, 18.0, 11.25, 10.75, 9.5]
Strings come with a bunch of methods. Follow the instructions closely to discover some of them. If you want to discover them in more detail, you can always type help(str) in the IPython Shell.
A string place has already been created for you to experiment with.
# string to experiment with: place
= "poolhouse"
place
# Use upper() on place: place_up
= place.upper()
place_up
# Print out place and place_up
print(place)
print(place_up)
# Print out the number of o's in place
print(place.count("o"))
poolhouse
POOLHOUSE
3
Strings are not the only Python types that have methods associated with them. Lists, floats, integers and booleans are also types that come packaged with a bunch of useful methods. In this exercise, you’ll be experimenting with:
index()
, to get the index of the first element of a list that matches its input andcount()
, to get the number of times an element appears in a list.You’ll be working on the list with the area of different parts of a house: areas. Instructions - Use the index() method to get the index of the element in areas that is equal to 20.0. Print out this index. - Call count() on areas to find out how many times 9.50 appears in the list. Again, simply print out this number.
# Create list areas
= [11.25, 18.0, 20.0, 10.75, 9.50]
areas
# Print out the index of the element 20.0
print(areas.index(20.0))
# Print out how often 9.50 appears in areas
print(areas.count(9.50))
2
1
Most list methods will change the list they’re called on. Examples are:
append()
, that adds an element to the list it is called on,remove()
, that removes the first element of a list that matches the input, andreverse()
, that reverses the order of the elements in the list it is called on.You’ll be working on the list with the area of different parts of the house: areas.
# Create list areas
= [11.25, 18.0, 20.0, 10.75, 9.50]
areas
# Use append twice to add poolhouse and garage size
24.5)
areas.append(15.45)
areas.append(
# Print out areas
print(areas)
# Reverse the orders of the elements in areas
= areas.reverse()
areas # Print out areas
print(areas)
[11.25, 18.0, 20.0, 10.75, 9.5, 24.5, 15.45]
None
As a data scientist, some notions of geometry never hurt. Let’s refresh some of the basics.
For a fancy clustering algorithm, you want to find the circumference, , and area, , of a circle. When the radius of the circle is r, you can calculate and
as: C = 2pir _A = pir^2
In Python, the symbol for exponentiation is . This operator raises the number to its left to the power of the number to its right. For example 34 is 3 to the power of 4 and will give 81.
To use the constant pi, you’ll need the math package. A variable r is already coded in the script. Fill in the code to calculate C and A and see how the print() functions create some nice printouts.
# Import the math package
import math
# Definition of radius
= 0.43
r
# Calculate C
= 2*math.pi*r
C
# Calculate A
= math.pi * r*r
A
# Build printout
print("Circumference: " + str(C))
print("Area: " + str(A))
Circumference: 2.701769682087222
Area: 0.5808804816487527
General imports, like import math, make all functionality from the math package available to you. However, if you decide to only use a specific part of a package, you can always make your import more selective:
from math import pi
Let’s say the Moon’s orbit around planet Earth is a perfect circle, with a radius r (in km) that is defined in the script.
# Import radians function of math package
from math import radians
# Definition of radius
= 192500
r
# Travel distance of Moon over 12 degrees. Store in dist.
= r*radians(12)
dist
# Print out dist
print(dist)
40317.10572106901
In this chapter, we’re going to dive into the world of baseball. Along the way, you’ll get comfortable with the basics of numpy, a powerful package to do data science.
A list baseball has already been defined in the Python script, representing the height of some baseball players in centimeters. Can you add some code here and there to create a numpy array from it?
# Import the numpy package as np
import numpy as np
# Create list baseball
= [180, 215, 210, 210, 188, 176, 209, 200]
baseball
# Create a numpy array from baseball: np_baseball
= np.array(baseball)
np_baseball # Print out type of np_baseball
print(type(np_baseball))
<class 'numpy.ndarray'>
You are a huge baseball fan. You decide to call the MLB (Major League Baseball) and ask around for some more statistics on the height of the main players. They pass along data on more than a thousand players, which is stored as a regular Python list: height_in. The height is expressed in inches. Can you make a numpy array out of it and convert the units to meters?
height_in is already available and the numpy package is loaded, so you can start straight away (Source: stat.ucla.edu).
= baseball_heights.astype(int)
height_in 0:10]
height_in[= baseball_weights.astype(int)
weight_lb weight_lb
array([180, 215, 210, ..., 205, 190, 195])
# Import numpy
import numpy as np
# Create a numpy array from height_in: np_height_in
= np.array(height_in)
np_height_in
# Print out np_height_in
print(np_height_in)
# Convert np_height_in to m: np_height_m
= 0.0254 * np_height_in
np_height_m # Print np_height_m
print(np_height_m)
[74 74 72 ... 75 75 73]
[1.8796 1.8796 1.8288 ... 1.905 1.905 1.8542]
The MLB also offers to let you analyze their weight data. Again, both are available as regular Python lists: height_in and weight_lb. height_in is in inches and weight_lb is in pounds.
It’s now possible to calculate the BMI of each baseball player. Python code to convert height_in to a numpy array with the correct units is already available in the workspace. Follow the instructions step by step and finish the game! height_in and weight_lb are available as regular lists.
# Instructions - Create a numpy array from the weight_lb list with the correct units. Multiply by 0.453592 to go from pounds to kilograms. Store the resulting numpy array as np_weight_kg. - Use np_height_m and np_weight_kg to calculate the BMI of each player. Use the following equation: - - BMI = weight(kg)/height(m)^2 - Save the resulting numpy array as bmi. - Print out bmi.
# Import numpy
import numpy as np
# Create array from height_in with metric units: np_height_m
= np.array(height_in) * 0.0254
np_height_m
# Create array from weight_lb with metric units: np_weight_kg
= 0.453592 * np.array(weight_lb)
np_weight_kg
# Calculate the BMI: bmi
= np_weight_kg/np_height_m**2
bmi
# Print out bmi
print(bmi)
[23.11037639 27.60406069 28.48080465 ... 25.62295933 23.74810865
25.72686361]
To subset both regular Python lists and numpy arrays, you can use square brackets:
x = [4 , 9 , 6, 3, 1] x[1] import numpy as np y = np.array(x) y[1]
For numpy specifically, you can also use boolean numpy arrays:
high = y > 5 y[high]
The code that calculates the BMI of all baseball players is already included. Follow the instructions and reveal interesting things from the data! height_in and weight_lb are available as regular lists.
# Instructions - Create a boolean numpy array: the element of the array should be True if the corresponding baseball player’s BMI is below 21. You can use the < operator for this. Name the array light. - Print the array light. - Print out a numpy array with the BMIs of all baseball players whose BMI is below 21. Use light inside square brackets to do a selection on the bmi array.
# Import numpy
import numpy as np
# Calculate the BMI: bmi
= np.array(height_in) * 0.0254
np_height_m = np.array(weight_lb) * 0.453592
np_weight_kg = np_weight_kg / np_height_m ** 2
bmi
# Create the light array
= bmi<21
light
# Print out light
print(light)
# Print out BMIs of all baseball players whose BMI is below 21
print(bmi[light])
[False False False ... False False False]
[20.54255679 20.54255679 20.69282047 20.69282047 20.34343189 20.34343189
20.69282047 20.15883472 19.4984471 20.69282047 20.9205219 ]
As Hugo explained before, numpy is great for doing vector arithmetic. If you compare its functionality with regular Python lists, however, some things have changed.
First of all, numpy arrays cannot contain elements with different types. If you try to build such a list, some of the elements’ types are changed to end up with a homogeneous list. This is known as type coercion.
Second, the typical arithmetic operators, such as +, -, * and / have a different meaning for regular Python lists and numpy arrays.
Have a look at this line of code:
np.array([True, 1, 2]) + np.array([3, 4, False])
True, 1, 2]) + np.array([3, 4, False]) np.array([
array([4, 5, 2])
You’ve seen it with your own eyes: Python lists and numpy arrays sometimes behave differently. Luckily, there are still certainties in this world. For example, subsetting (using the square bracket notation on lists or arrays) works exactly the same. To see this for yourself, try the following lines of code in the IPython Shell: ` x = [“a”, “b”, “c”] x[1]
np_x = np.array(x) np_x[1] ` The script in the editor already contains code that imports numpy as np, and stores both the height and weight of the MLB players as numpy arrays. height_in and weight_lb are available as regular lists.
# Import numpy
import numpy as np
# Store weight and height lists as numpy arrays
= np.array(weight_lb)
np_weight_lb = np.array(height_in)
np_height_in
# Print out the weight at index 50
print(np_weight_lb[50])
# Print out sub-array of np_height_in: index 100 up to and including index 110
print(np_height_in[100:111])
200
[73 74 72 73 69 72 73 75 75 73 72]
Before working on the actual MLB data, let’s try to create a 2D numpy array from a small list of lists.
In this exercise, baseball is a list of lists. The main list contains 4 elements. Each of these elements is a list containing the height and the weight of 4 baseball players, in this order. baseball is already coded for you in the script.
# Instructions - Use np.array() to create a 2D numpy array from baseball. Name it np_baseball. - Print out the type of np_baseball. - Print out the shape attribute of np_baseball. Use np_baseball.shape.
# Import numpy
import numpy as np
# Create baseball, a list of lists
= [[180, 78.4],
baseball 215, 102.7],
[210, 98.5],
[188, 75.2]]
[
# Create a 2D numpy array from baseball: np_baseball
= np.array(baseball)
np_baseball # Print out the type of np_baseball
print(type(np_baseball))
# Print out the shape of np_baseball
print(np_baseball.shape )
<class 'numpy.ndarray'>
(4, 2)
You have another look at the MLB data and realize that it makes more sense to restructure all this information in a 2D numpy array. This array should have 1015 rows, corresponding to the 1015 baseball players you have information on, and 2 columns (for height and weight).
The MLB was, again, very helpful and passed you the data in a different structure, a Python list of lists. In this list of lists, each sublist represents the height and weight of a single baseball player. The name of this embedded list is baseball.
Can you store the data as a 2D array to unlock numpy’s extra functionality? baseball is available as a regular list of lists.
# Import numpy package
import numpy as np
# Create a 2D numpy array from baseball: np_baseball
= np.array(baseball)
np_baseball
# Print out the shape of np_baseball
print(np_baseball.shape)
(4, 2)
If your 2D numpy array has a regular structure, i.e. each row and column has a fixed number of values, complicated ways of subsetting become very easy. Have a look at the code below where the elements “a” and “c” are extracted from a list of lists. `regular list of lists
x = [[“a”, “b”], [“c”, “d”]]
[x[0][0], x[1][0]]
numpy import numpy as np
np_x = np.array(x)
np_x[:, 0] ` For regular Python lists, this is a real pain. For 2D numpy arrays, however, it’s pretty intuitive! The indexes before the comma refer to the rows, while those after the comma refer to the columns. The : is for slicing; in this example, it tells Python to include all rows.
The code that converts the pre-loaded baseball list to a 2D numpy array is already in the script. The first column contains the players’ height in inches and the second column holds player weight, in pounds. Add some lines to make the correct selections. Remember that in Python, the first element is at index 0! baseball is available as a regular list of lists.
277
# Import numpy package
import numpy as np
# Create np_baseball (2 cols)
= np.array(baseball)
np_baseball
# Print out the 50th row of np_baseball
print(np_baseball[49,:])
# Select the entire second column of np_baseball: np_weight_lb
= np_baseball[:,1]
np_weight_lb
# Print out height of 124th player
print(np_baseball[123,0])
[ 70 195]
75
Remember how you calculated the Body Mass Index for all baseball players? numpy was able to perform all calculations element-wise (i.e. element by element). For 2D numpy arrays this isn’t any different! You can combine matrices with single numbers, with vectors, and with other matrices.
Execute the code below in the IPython shell and see if you understand: import numpy as np np_mat = np.array([[1, 2], [3, 4], [5, 6]]) np_mat * 2 np_mat + np.array([10, 10]) np_mat + np_mat
np_baseball is coded for you; it’s again a 2D numpy array with 3 columns representing height (in inches), weight (in pounds) and age (in years). baseball is available as a regular list of lists and updated is available as 2D numpy array. Instructions - You managed to get hold of the changes in height, weight and age of all baseball players. It is available as a 2D numpy array, updated. Add np_baseball and updated and print out the result. - You want to convert the units of height and weight to metric (meters and kilograms, respectively). As a first step, create a numpy array with three values: 0.0254, 0.453592 and 1. Name this array conversion. - Multiply np_baseball with conversion and print out the result.
# Import numpy package
import numpy as np
# Create np_baseball (3 cols)
= np.array(baseball)
np_baseball
# Print out addition of np_baseball and updated
print(np_baseball+updated)
# Create numpy array: conversion
=np.array([0.0254, 0.453592,1])
conversion # Print out product of np_baseball and conversion
print(np_baseball*conversion)
[[ 75.2303559 168.83775102 23.99 ]
[ 75.02614252 231.09732309 35.69 ]
[ 73.1544228 215.08167641 31.78 ]
[ 72.64427532 204.90461929 36.43 ]
[ 74.00590086 190.24342718 36.71 ]
[ 69.97953547 188.19841763 30.39 ]
[ 69.62874324 222.72324216 31.77 ]
[ 72.27075194 191.12053687 36.07 ]
[ 76.47655945 220.17504464 31.19 ]
[ 71.91699376 172.98883751 28.05 ]]
[[ 1.8796 81.64656 22.99 ]
[ 1.8796 97.52228 34.69 ]
[ 1.8288 95.25432 30.78 ]
[ 1.8288 95.25432 35.43 ]
[ 1.8542 85.275296 35.71 ]
[ 1.7526 79.832192 29.39 ]
[ 1.7526 94.800728 30.77 ]
[ 1.8034 90.7184 35.07 ]
[ 1.9304 104.779752 30.19 ]
[ 1.8034 81.64656 27.05 ]]
You now know how to use numpy functions to get a better feeling for your data. It basically comes down to importing numpy and then calling several simple functions on the numpy arrays:
import numpy as np x = [1, 4, 8, 10, 12] np.mean(x) np.median(x)
The baseball data is available as a 2D numpy array with 3 columns (height, weight, age) and 1015 rows. The name of this numpy array is np_baseball. After restructuring the data, however, you notice that some height values are abnormally high. Follow the instructions and discover which summary statistic is best suited if you’re dealing with so-called outliers. np_baseball is available.
# Import numpy
import numpy as np
# Create np_height_in from np_baseball
= np_baseball[:,0]
np_height_in
# Print out the mean of np_height_in
print(np.mean(np_height_in))
# Print out the median of np_height_in
print(np.median(np_height_in))
72.1
72.0
Because the mean and median are so far apart, you decide to complain to the MLB. They find the error and send the corrected data over to you. It’s again available as a 2D NumPy array np_baseball, with three columns.
The Python script in the editor already includes code to print out informative messages with the different summary statistics. Can you finish the job? np_baseball is available.
np.std()
on the first column of np_baseball to calculate stddev. Replace None with the correct code.np.corrcoef()
to store the correlation between the first and second column of np_baseball in corr. Replace None with the correct code.# Import numpy
import numpy as np
# Print mean height (first column)
= np.mean(np_baseball[:,0])
avg print("Average: " + str(avg))
# Print median height. Replace 'None'
= np.median(np_baseball[:,0])
med print("Median: " + str(med))
# Print out the standard deviation on height. Replace 'None'
= np.std(np_baseball[:,0])
stddev print("Standard Deviation: " + str(stddev))
# Print out correlation between first and second column. Replace 'None'
= np.corrcoef(np_baseball[:,0],np_baseball[:,1])
corr print("Correlation: " + str(corr))
Average: 72.1
Median: 72.0
Standard Deviation: 2.118962010041709
Correlation: [[1. 0.45629209]
[0.45629209 1. ]]
In the last few exercises you’ve learned everything there is to know about heights and weights of baseball players. Now it’s time to dive into another sport: soccer.
You’ve contacted FIFA for some data and they handed you two lists. The lists are the following:
positions = ['GK', 'M', 'A', 'D', ...] heights = [191, 184, 185, 180, ...]
Each element in the lists corresponds to a player. The first list, positions, contains strings representing each player’s position. The possible positions are: ‘GK’ (goalkeeper), ‘M’ (midfield), ‘A’ (attack) and ‘D’ (defense). The second list, heights, contains integers representing the height of the player in cm. The first player in the lists is a goalkeeper and is pretty tall (191 cm).
You’re fairly confident that the median height of goalkeepers is higher than that of other players on the soccer field. Some of your friends don’t believe you, so you are determined to show them using the data you received from FIFA and your newly acquired Python skills. heights and positions are available as lists Instructions - Convert heights and positions, which are regular lists, to numpy arrays. Call them np_heights and np_positions. - Extract all the heights of the goalkeepers. You can use a little trick here: use np_positions == 'GK'
as an index for np_heights. Assign the result to gk_heights. - Extract all the heights of all the other players. This time use np_positions != ‘GK’ as an index for np_heights. Assign the result to other_heights. - Print out the median height of the goalkeepers using np.median()
. Replace None with the correct code. - Do the same for the other players. Print out their median height. Replace None with the correct code.
=soccer_positions
positions= soccer_heights heights
# Import numpy
import numpy as np
# Convert positions and heights to numpy arrays: np_positions, np_heights
= np.array(positions)
np_positions = np.array(heights)
np_heights
# Heights of the goalkeepers: gk_heights
= np_heights[np_positions=='GK']
gk_heights
# Heights of the other players: other_heights
= np_heights[np_positions!='GK']
other_heights
# Print out the median height of goalkeepers. Replace 'None'
print("Median height of goalkeepers: " + str(np.median(gk_heights)))
# Print out the median height of other players. Replace 'None'
print("Median height of other players: " + str(np.median(other_heights)))
Median height of goalkeepers: 188.0
Median height of other players: 181.0