Skip to main content

Command Palette

Search for a command to run...

Discovering NumPy

Published
5 min read
#install numpy package
pip3 install numpy

# Import the numpy package as np
import numpy as np

baseball = [180, 215, 210, 210, 188, 176, 209, 200]

# Create a numpy array from baseball: np_baseball
np_baseball = np.array(baseball)

# Print out type of np_baseball
print(type(np_baseball))

<script.py> output:
    <class 'numpy.ndarray'>

Exercise 2.

Statistics on the height of the main baseball team players is required. There is data available on more than a thousand players, stored as a regular Python list: height_in. The height is expressed in inches.
→ Make a numpy array out of it and convert the units to meters.

# Import numpy
import numpy as np

# Create a numpy array from height_in: np_height_in
np_height_in = np.array(height_in)

# Print out np_height_in
print(np_height_in)

# Convert np_height_in to m: np_height_m
np_height_m = np_height_in * 0.0254

# Print np_height_m
print(np_height_m)
[74 74 72 ... 75 75 73]
[1.8796 1.8796 1.8288 ... 1.905  1.905  1.8542]

<script.py> output:
    [74 74 72 ... 75 75 73]
    [1.8796 1.8796 1.8288 ... 1.905  1.905  1.8542]

Exercise. Vector arithmetics in NumPy

Differences of NumPy functionality in comparison to regular lists:

  • numpy arrays cannot contain elements with different types. If types are mixed, like booleans and integers, numpy automatically converts them to a common type. Booleans like True and False are treated as 1 and 0 when combined with numbers, so the array ends up as integers.

  • the typical arithmetic operators, such as +, -, * and / have a different meaning for regular Python lists and numpy arrays.

→ Subset np_weight_lb by printing out the element at index 50.

→ Print out a sub-array of np_height_in that contains the elements at index 100 up to and including index 110.

import numpy as np

np_weight_lb = np.array(weight_lb)
np_height_in = np.array(height_in)

# Print out the weight at index 50
print(np_weight_lb[50])

# Print out sub-array of np_height_in: index 100 up to and including index 110
print(np_height_in[100:111])

<script.py> output:
    200
    [73 74 72 73 69 72 73 75 75 73 72]

2D NumPy Arrays

np_2d = np.array([[1.73, 1.68, 1.71, 1.79, 1.69],
                  [65.4, 14.7, 45.7, 13.1, 18.8]]

np_2d.shape

Output: 
       (2,5) # 2 columns, 5 rows
  • .shape is a so-called attribute of the np2d array → gives more information about what the data structure looks like.

  • Note: that the syntax for accessing an attribute looks a bit like calling a method, but they are not the same!

  • Remember: methods have round brackets after them, and attributes do not.

  • Also for 2D arrays, the NumPy rule applies: an array can only contain a single type. If you change one float to be string, all the array elements will be coerced to strings, to end up with a homogeneous array.

Subsetting 2D NumPy Arrays

                    0      1     2     3     4

np_2d = np.array([[1.73, 1.68, 1.71, 1.79, 1.69],      0  #height of the family members
                  [65.4, 64.7, 45.7, 63.1, 78.8]]      1  #weight of the family members

np_2d[0][2]    #first value for raw, second for column!
Output: 1.71

np_2d[0, 2]
Output: 1.71
                    0      1     2     3     4

np_2d = np.array([[1.73, 1.68, 1.71, 1.79, 1.69],      0
                  [65.4, 64.7, 45.7, 63.1, 78.8]]      1

# select the height and weight of the second and third family member

np_2d[:, 1:3] # we need both rows, so we put a colon before a comma

Output: array([[1.68, 1.71],
               [64.7, 45.7]])

# if only weight data is required: 

np_2d[1, :]

Output: array([65.4, 64.7, 45.7, 63.1, 78.8])

Exercise 1

import numpy as np

baseball = [[180, 78.4],
            [215, 102.7],
            [210, 98.5],
            [188, 75.2]]

# Create a 2D numpy array from baseball: np_baseball
np_baseball = np.array(baseball)

# Print out the type of np_baseball
print(type(np_baseball))

# Print out the shape of np_baseball
np_baseball.shape = np.shape(baseball)
print(np_baseball.shape)

<script.py> output:
    <class 'numpy.ndarray'>
    (4, 2)

Exercise 2

import numpy as np

np_baseball = np.array(baseball)

# Print out the 50th row of np_baseball
print(np_baseball[49, :])

# Select the entire second column of np_baseball: np_weight_lb
np_weight_lb = np_baseball[:, 1]

# Print out height of 124th player
print(np_baseball[123, 0])

<script.py> output:
    [ 70 195]
    75

Exercise 3

Instructions:

  • You managed to get hold of the changes in height, weight and age of all baseball players. It is available as a 2D numpy array, updated. Add np_baseball and updated and print out the result.

  • You want to convert the units of height and weight to metric (meters and kilograms, respectively). As a first step, create a numpy array with three values: 0.0254, 0.453592 and 1. Name this array conversion.

  • Multiply np_baseball with conversion and print out the result.

      import numpy as np
    
      np_baseball = np.array(baseball)
    
      # Print out addition of np_baseball and updated
      print(updated + np_baseball)
    
      # Create numpy array: conversion
      conversion = np.array([0.0254, 0.453592, 1])
    
      # Print out product of np_baseball and conversion
      print(np_baseball * conversion)
    
      <script.py> output:
          [[ 75.2303559  168.83775102  23.99      ]
           [ 75.02614252 231.09732309  35.69      ]
           [ 73.1544228  215.08167641  31.78      ]
           ...
           [ 76.09349925 209.23890778  26.19      ]
           [ 75.82285669 172.21799965  32.01      ]
           [ 73.99484223 203.14402711  28.92      ]]
          [[ 1.8796  81.64656 22.99   ]
           [ 1.8796  97.52228 34.69   ]
           [ 1.8288  95.25432 30.78   ]
           ...
           [ 1.905   92.98636 25.19   ]
           [ 1.905   86.18248 31.01   ]
           [ 1.8542  88.45044 27.92   ]]
    

NumPy: Basic Statistics

Exercise 1

The baseball data is available as a 2D numpy array with 3 columns (height, weight, age) and 1015 rows. The name of this numpy array is np_baseball. After restructuring the data, however, it becomes clear that some height values are abnormally high. Discover which summary statistic is best suited if you're dealing with so-called outliers. np_baseball is available.

import numpy as np

# Create np_height_in from np_baseball that is equal to first column of np_baseball
np_height_in = np_baseball[:, 0]

# Print out the mean of np_height_in
print(np.mean(np_height_in))

# Print out the median of np_height_in
print(np.median(np_height_in))

<script.py> output:
    1586.4610837438424
    74.0

Exercise 2

avg = np.mean(np_baseball[:,0])
print("Average: " + str(avg))

# Print median height
med = np.median(np_baseball)
print("Median: " + str(med))

# Print out the standard deviation on height
stddev = np.std(np_baseball[:, 0])
print("Standard Deviation: " + str(stddev))

# Print out correlation between first and second column
corr = np.corrcoef(np_baseball[:, 0], np_baseball[:, 1])
print("Correlation: " + str(corr))

<script.py> output:
    Average: 73.6896551724138
    Median: 74.0
    Standard Deviation: 2.312791881046546
    Correlation: [[1.         0.53153932]
     [0.53153932 1.        ]]