Discovering NumPy
#install numpy package
pip3 install numpy
# Import the numpy package as np
import numpy as np
baseball = [180, 215, 210, 210, 188, 176, 209, 200]
# Create a numpy array from baseball: np_baseball
np_baseball = np.array(baseball)
# Print out type of np_baseball
print(type(np_baseball))
<script.py> output:
<class 'numpy.ndarray'>
Exercise 2.
Statistics on the height of the main baseball team players is required. There is data available on more than a thousand players, stored as a regular Python list: height_in. The height is expressed in inches.
→ Make a numpy array out of it and convert the units to meters.
# Import numpy
import numpy as np
# Create a numpy array from height_in: np_height_in
np_height_in = np.array(height_in)
# Print out np_height_in
print(np_height_in)
# Convert np_height_in to m: np_height_m
np_height_m = np_height_in * 0.0254
# Print np_height_m
print(np_height_m)
[74 74 72 ... 75 75 73]
[1.8796 1.8796 1.8288 ... 1.905 1.905 1.8542]
<script.py> output:
[74 74 72 ... 75 75 73]
[1.8796 1.8796 1.8288 ... 1.905 1.905 1.8542]
Exercise. Vector arithmetics in NumPy
Differences of NumPy functionality in comparison to regular lists:
numpyarrays cannot contain elements with different types. If types are mixed, like booleans and integers,numpyautomatically converts them to a common type. Booleans likeTrueandFalseare treated as1and0when combined with numbers, so the array ends up as integers.the typical arithmetic operators, such as
+,-,*and/have a different meaning for regular Python lists andnumpyarrays.
→ Subset np_weight_lb by printing out the element at index 50.
→ Print out a sub-array of np_height_in that contains the elements at index 100 up to and including index 110.
import numpy as np
np_weight_lb = np.array(weight_lb)
np_height_in = np.array(height_in)
# Print out the weight at index 50
print(np_weight_lb[50])
# Print out sub-array of np_height_in: index 100 up to and including index 110
print(np_height_in[100:111])
<script.py> output:
200
[73 74 72 73 69 72 73 75 75 73 72]
2D NumPy Arrays
np_2d = np.array([[1.73, 1.68, 1.71, 1.79, 1.69],
[65.4, 14.7, 45.7, 13.1, 18.8]]
np_2d.shape
Output:
(2,5) # 2 columns, 5 rows
.shapeis a so-calledattributeof thenp2d array→ gives more information about what the data structure looks like.Note: that the syntax for accessing an attribute looks a bit like calling a method, but they are not the same!
Remember: methods have round brackets after them, and attributes do not.
Also for 2D arrays, the NumPy rule applies: an array can only contain a single type. If you change one float to be string, all the array elements will be coerced to strings, to end up with a homogeneous array.
Subsetting 2D NumPy Arrays
0 1 2 3 4
np_2d = np.array([[1.73, 1.68, 1.71, 1.79, 1.69], 0 #height of the family members
[65.4, 64.7, 45.7, 63.1, 78.8]] 1 #weight of the family members
np_2d[0][2] #first value for raw, second for column!
Output: 1.71
np_2d[0, 2]
Output: 1.71
0 1 2 3 4
np_2d = np.array([[1.73, 1.68, 1.71, 1.79, 1.69], 0
[65.4, 64.7, 45.7, 63.1, 78.8]] 1
# select the height and weight of the second and third family member
np_2d[:, 1:3] # we need both rows, so we put a colon before a comma
Output: array([[1.68, 1.71],
[64.7, 45.7]])
# if only weight data is required:
np_2d[1, :]
Output: array([65.4, 64.7, 45.7, 63.1, 78.8])
Exercise 1
import numpy as np
baseball = [[180, 78.4],
[215, 102.7],
[210, 98.5],
[188, 75.2]]
# Create a 2D numpy array from baseball: np_baseball
np_baseball = np.array(baseball)
# Print out the type of np_baseball
print(type(np_baseball))
# Print out the shape of np_baseball
np_baseball.shape = np.shape(baseball)
print(np_baseball.shape)
<script.py> output:
<class 'numpy.ndarray'>
(4, 2)
Exercise 2
import numpy as np
np_baseball = np.array(baseball)
# Print out the 50th row of np_baseball
print(np_baseball[49, :])
# Select the entire second column of np_baseball: np_weight_lb
np_weight_lb = np_baseball[:, 1]
# Print out height of 124th player
print(np_baseball[123, 0])
<script.py> output:
[ 70 195]
75
Exercise 3
Instructions:
You managed to get hold of the changes in height, weight and age of all baseball players. It is available as a 2D
numpyarray,updated. Addnp_baseballandupdatedand print out the result.You want to convert the units of height and weight to metric (meters and kilograms, respectively). As a first step, create a
numpyarray with three values:0.0254,0.453592and1. Name this arrayconversion.Multiply
np_baseballwithconversionand print out the result.import numpy as np np_baseball = np.array(baseball) # Print out addition of np_baseball and updated print(updated + np_baseball) # Create numpy array: conversion conversion = np.array([0.0254, 0.453592, 1]) # Print out product of np_baseball and conversion print(np_baseball * conversion) <script.py> output: [[ 75.2303559 168.83775102 23.99 ] [ 75.02614252 231.09732309 35.69 ] [ 73.1544228 215.08167641 31.78 ] ... [ 76.09349925 209.23890778 26.19 ] [ 75.82285669 172.21799965 32.01 ] [ 73.99484223 203.14402711 28.92 ]] [[ 1.8796 81.64656 22.99 ] [ 1.8796 97.52228 34.69 ] [ 1.8288 95.25432 30.78 ] ... [ 1.905 92.98636 25.19 ] [ 1.905 86.18248 31.01 ] [ 1.8542 88.45044 27.92 ]]
NumPy: Basic Statistics
Exercise 1
The baseball data is available as a 2D numpy array with 3 columns (height, weight, age) and 1015 rows. The name of this numpy array is np_baseball. After restructuring the data, however, it becomes clear that some height values are abnormally high. Discover which summary statistic is best suited if you're dealing with so-called outliers. np_baseball is available.
import numpy as np
# Create np_height_in from np_baseball that is equal to first column of np_baseball
np_height_in = np_baseball[:, 0]
# Print out the mean of np_height_in
print(np.mean(np_height_in))
# Print out the median of np_height_in
print(np.median(np_height_in))
<script.py> output:
1586.4610837438424
74.0
Exercise 2
avg = np.mean(np_baseball[:,0])
print("Average: " + str(avg))
# Print median height
med = np.median(np_baseball)
print("Median: " + str(med))
# Print out the standard deviation on height
stddev = np.std(np_baseball[:, 0])
print("Standard Deviation: " + str(stddev))
# Print out correlation between first and second column
corr = np.corrcoef(np_baseball[:, 0], np_baseball[:, 1])
print("Correlation: " + str(corr))
<script.py> output:
Average: 73.6896551724138
Median: 74.0
Standard Deviation: 2.312791881046546
Correlation: [[1. 0.53153932]
[0.53153932 1. ]]