Unix/Python/NumPy Tutorial

Brief Overview of Python and NumPy

(Adopted from Notes by Hal Duame, U. of Maryland)

Python Basics

Invoking the Interpreter
Operators
Strings
Dir and Help
Built-in Data Structures

Lists
Tuples
Sets
Dictionaries

Writing Scripts
Indentation
Writing Functions
Object Basics

Defining Classes
Using Objects

Tips and Tricks
Troubleshooting

NumPy

NumPy Basics
matplotlib Basics

More References

Python can be run in one of two modes. It can either be used interactively, via an interpreter, or it can be called from the command line to execute a script. We will first use the Python interpreter interactively. Typically, you invoke the interpreter by entering python at the command prompt.

Operators

The Python interpreter can be used to evaluate expressions, for example simple arithmetic expressions. If you enter such expressions at the prompt (>>>) they will be evaluated and the result will be returned on the next line.

>>> 1 + 1 2 >>> 2 * 3 6 >>> 2 ** 3 8Boolean operators also exist in Python.

>>> 1==0 False >>> not (1==0) True >>> (2==2) and (2==3) False >>> (2==2) or (2==3) True

Strings

Like Java, Python has a built in string type. The + operator is overloaded to do string concatenation on string values.





>>> 'machine' + "learning" 

'machinelearning'

There are many built-in methods which allow you to manipulate strings.

 
>>> 'machine'.upper()

'MACHINE'

>>> 'HELP'.lower()

'help'

>>> len('Help')

4

Notice that we can use either single quotes ' ' or double quotes " " to surround string.

We can also store expressions into variables.





>>> s = 'hello world' 

>>> print s 

hello world 

>>> s.upper()

'HELLO WORLD'

>>> len(s.upper())

11

>>> num = 8.0 

>>> num += 2.5 

>>> print num 

10.5

In Python, unlike Java or C, you do not have declare variables before you assign to them.

To see what methods Python provides for a datatype, use the dir and help commands: >>> s = 'abc' >>> dir(s) ['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__','__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__','__repr__', '__rmod__', '__rmul__', '__setattr__', '__str__', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'replace', 'rfind','rindex', 'rjust', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill'] >>> help(s.find)

Help on built-in function find:



find(...)
    S.find(sub [,start [,end]]) -> int
    
    Return the lowest index in S where substring sub is found,
    such that sub is contained within s[start,end].  Optional
    arguments start and end are interpreted as in slice notation.
    
    Return -1 on failure.


>> s.find('b')

1

Built-in Data Structures

Lists

Lists store a sequence of mutable items:





>>> fruits = ['apple','orange','pear','banana']

>>> fruits[0] 

'apple'

We can use the + operator to do list concatenation:





>>> otherFruits = ['kiwi','strawberry']

>>> fruits + otherFruits

>>> ['apple', 'orange', 'pear', 'banana', 'kiwi', 'strawberry']

Python also allows negative-indexing from the back of the list. For instance, fruits[-1] will access the last element 'banana':




>>> fruits[-2]

'pear'

>>> fruits.pop()

'banana'

>>> fruits

['apple', 'orange', 'pear']

>>> fruits.append('grapefruit') 

>>> fruits 

['apple', 'orange', 'pear', 'grapefruit'] 

>>> fruits[-1] = 'pineapple' 

>>> fruits 

['apple', 'orange', 'pear', 'pineapple']

We can also index multiple adjacent elements using the slice operator. For instance fruits[1:3] which returns a list containing the elements at position 1 and 2. In general fruits[start:stop] will get the elements in start, start+1, ..., stop-1. We can also do fruits[start:] which returns all elements starting from the start index. Also fruits[:end] will return all elements before the element at position end:




>>> fruits[0:2] 

['apple', 'orange'] 

>>> fruits[:3]

['apple', 'orange', 'pear'] 

>>> fruits[2:]

['pear', 'pineapple'] 

>>> len(fruits) 

4

The items stored in lists can be any Python data type. So for instance we can have lists of lists:




>>> lstOfLsts = [['a','b','c'],[1,2,3],['one','two','three']] 

>>> lstOfLsts[1][2]  

3

>>> lstOfLsts[0].pop()

'c'

>>> lstOfLsts

[['a', 'b'],[1, 2, 3],['one', 'two', 'three']]


>>> lst = ['a','b','c']

>>> lst.reverse()

>>> ['c','b','a']

Tuples

A data structure similar to the list is the tuple, which is like a list except that it is immutable once it is created (i.e., you cannot change its content once created). Note that tuples are surrounded with parentheses while lists have square brackets.





>>> pair = (3,5)

>>> pair[0]

3

>>> x,y = pair

>>> x

3

>>> y

5 

>>> pair[1] = 6

TypeError: object does not support item assignment

The attempt to modify an immutable structure raised an exception. This is how many errors will manifest: index out of bounds errors, type errors, and so on will all report exceptions in this way.

Sets

A set is another data structure that serves as an unordered list with no duplicate items. Below, we show how to create a set, add things to the set, test if an item is in the set, and perform common set operations (difference, intersection, union):





>>> shapes = ['circle','square','triangle','circle']

>>> setOfShapes = set(shapes)

>>> setOfShapes 

set(['circle','square','triangle']) 

>>> setOfShapes.add('polygon') 

>>> setOfShapes 

set(['circle','square','triangle','polygon']) 

>>> 'circle' in setOfShapes 

True 

>>> 'rhombus' in setOfShapes 

False 

>>> favoriteShapes = ['circle','triangle','hexagon']

>>> setOfFavoriteShapes = set(favoriteShapes)

>>> setOfShapes - setOfFavoriteShapes 

set(['square','polyon']) 

>>> setOfShapes & setOfFavoriteShapes 

set(['circle','triangle'])

>>> setOfShapes | setOfFavoriteShapes 

set(['circle','square','triangle','polygon','hexagon'])

Dictionaries (Dicts)

The last built-in data structure is the dictionary which stores a map from one type of object (the key) to another (the value). The key must be an immutable type (string, number, or tuple). The value can be any Python data type.

Note: In the example below, the printed order of the keys returned by Python could be different than shown below. The reason is that unlike lists which have a fixed ordering, a dictionary is simply a hash table for which there is no fixed ordering of the keys.





>>> studentIds = {'knuth': 42.0, 'turing': 56.0, 'nash': 92.0 }

>>> studentIds['turing']

56.0

>>> studentIds['nash'] = 'ninety-two'

>>> studentIds

{'knuth': 42.0, 'turing': 56.0, 'nash': 'ninety-two'}

>>> del studentIds['knuth']

>>> studentIds

{'turing': 56.0, 'nash': 'ninety-two'}

>>> studentIds['knuth'] = [42.0,'forty-two']

>>> studentIds

{'knuth': [42.0, 'forty-two'], 'turing': 56.0, 'nash': 'ninety-two'}

>>> studentIds.keys()

['knuth', 'turing', 'nash']

>>> studentIds.values()

[[42.0, 'forty-two'], 56.0, 'ninety-two']

>>> studentIds.items()

[('knuth',[42.0, 'forty-two']), ('turing',56.0), ('nash','ninety-two')]

>>> len(studentIds) 

3

As with nested lists, you can also create dictionaries of dictionaries.

Example Scripts


# This is what a comment looks like 
fruits = ['apples','oranges','pears','bananas']
for fruit in fruits:
    print fruit + ' for sale'

fruitPrices = {'apples': 2.00, 'oranges': 1.50, 'pears': 1.75}
for fruit, price in fruitPrices.items():
    if price < 2.00:
        print '%s cost %f a pound' % (fruit, price)
    else:
        print fruit + ' are too expensive!'

Example of python's list comprehension construction:


nums = [1,2,3,4,5,6]
plusOneNums = [x+1 for x in nums]
oddNums = [x for x in nums if x % 2 == 1]
print oddNums
oddNumsPlusOne = [x+1 for x in nums if x % 2 ==1]
print oddNumsPlusOne

Put this code into a file called listcomp.py and run the script:





$ python listcomp.py

[1,3,5]

[2,4,6]

Beware of Indentation!

Unlike many other languages, Python uses the indentation in the source code for interpretation. So for instance, for the following script:

if 0 == 1: 
    print 'We are in a world of arithmetic pain' 
print 'Thank you for playing'

will output




Thank you for playing

But if we had written the script as


if 0 == 1: 
    print 'We are in a world of arithmetic pain'
    print 'Thank you for playing'

there would be no output. The moral of the story: be careful how you indent! It's best to use a single tab for indentation.

Writing Functions

As in Scheme or Java, in Python you can define your own functions:


fruitPrices = {'apples':2.00, 'oranges': 1.50, 'pears': 1.75}

def buyFruit(fruit, numPounds):
    if fruit not in fruitPrices:
        print "Sorry we don't have %s" % (fruit)
    else:
        cost = fruitPrices[fruit] * numPounds
        print "That'll be %f please" % (cost)

# Main Function
if __name__ == '__main__':        
    buyFruit('apples',2.4)
    buyFruit('coconuts',2)

Save this script as fruit.py and run it:




$ python fruit.py

That'll be 4.800000 please

Sorry we don't have coconuts

Object Basics

Although this isn't a class in object-oriented programming, you'll have to use some objects in the programming projects, and so it's worth covering the basics of objects in Python. An object encapsulates data and provides functions for interacting with that data.

Defining Classes

Here's an example of defining a class named FruitShop:


class FruitShop:

    def __init__(self, name, fruitPrices):
        """
            name: Name of the fruit shop
            
            fruitPrices: Dictionary with keys as fruit 
            strings and prices for values e.g. 
            {'apples':2.00, 'oranges': 1.50, 'pears': 1.75} 
        """
        self.fruitPrices = fruitPrices
        self.name = name
        print 'Welcome to the %s fruit shop' % (name)
        
    def getCostPerPound(self, fruit):
        """
            fruit: Fruit string
        Returns cost of 'fruit', assuming 'fruit'
        is in our inventory or None otherwise
        """
        if fruit not in self.fruitPrices:
            print "Sorry we don't have %s" % (fruit)
            return None
        return self.fruitPrices[fruit]
        
    def getPriceOfOrder(self, orderList):
        """
            orderList: List of (fruit, numPounds) tuples
            
        Returns cost of orderList. If any of the fruit are  
        """ 
        totalCost = 0.0             
        for fruit, numPounds in orderList:
            costPerPound = self.getCostPerPound(fruit)
            if costPerPound != None:
                totalCost += numPounds * costPerPound
        return totalCost
    
    def getName(self):
        return self.name

The FruitShop class has some data, the name of the shop and the prices per pound of some fruit, and it provides functions, or methods, on this data. What advantage is there to wrapping this data in a class? There are two reasons: 1) Encapsulating the data prevents it from being altered or used inappropriately and 2) The abstraction that objects provide make it easier to write general-purpose code.

Using Objects

So how do we make an object and use it? Download the FruitShop implementation from here and save it to a file called shop.py. We then import the file using import shop, since shop.py is the name of the file, and make instances of the FruitShop by calling shop.FruitShop('MyFruitShop', myDictionary) (i.e., filename.className([args])). We can use the FruitShop as follows:


import shop

name = 'Best Fruits'
fruitPrices = {'apples':2.00, 'oranges': 1.50, 'pears': 1.75}
myFruitShop = shop.FruitShop(name, fruitPrices)
print myFruitShop.getCostPerPound('apples')

otherName = 'Fruits R Us'
otherFruitPrices = {'kiwis':1.00, 'bananas': 1.50, 'peaches': 2.75}
otherFruitShop = shop.FruitShop(otherName, otherFruitPrices)
print otherFruitShop.getCostPerPound('bananas')

Copy the code above into a file called shopTest.py (in the same directory as shop.py) and run it:

 


$ python shopTest.py

Welcome to the Best Fruits fruit shop

2.0

Welcome to the Fruits R Us fruit shop

1.5

Static vs Instance Variables

The following example with illustrate how to use static and instance variables in python.
Create the person_class.py containing the following code:


class Person:
    population = 0
    def __init__(self, myAge):
        self.age = myAge
        Person.population += 1
    def get_population(self):
        return Person.population
    def get_age(self):
        return self.age

We first compile the script:


$ python person_class.py

Now use the class as follows:


>>> import person_class

>>> p1 = person_class.Person(12)

>>> p1.get_population()

1 

>>> p2 = person_class.Person(63)

>>> p1.get_population()

2 

>>> p2.get_population()

2 

>>> p1.get_age()

12 

>>> p2.get_age()

63

In the code above, age is an instance variable and population is a static variable. population is shared by all instances of the Person class whereas each instance has its own age variable.

Exercise: Write a function, shopSmart(orders,shops) which takes an orderList (like the kind passed in to FruitShop.getPriceOfOrder) and a list of FruitShop and returns the FruitShop where your order costs the least amount in total. This function should be defined in a file called shopSmart.py. A stub implementation is provided here. Use the shop.py implementation as a "support" file.

Test Case:

orders1 = [('apples',1.0), ('oranges',3.0)]
orders2 = [('apples',3.0)]			 
dir1 = {'apples': 2.0, 'oranges':1.0}
shop1 =  shop.FruitShop('shop1',dir1)
dir2 = {'apples': 1.0, 'oranges': 5.0}
shop2 = shop.FruitShop('shop2',dir2)
shops = [shop1, shop2]

The following are true:

shopSmart.shopSmart(orders1, shops).getName() == 'shop1'

and

shopSmart.shopSmart(orders2, shops).getName() == 'shop2'

After importing a file, if you edit a source file, the changes will not be immediately propagated in the interpreter. For this, use the reload command: >>> reload(shop)

NumPy

NumPy Basics

Let's first test NumPy by doing some simple vector operations:

>>> from numpy import * >>> array([1,2,3,4,5]) array([1, 2, 3, 4, 5]) >>> array([1,2,3,4,5]) / 5 array([0, 0, 0, 0, 1]) >>> array([1.0,2,3,4,5]) array([ 1., 2., 3., 4., 5.]) >>> array([1.0,2,3,4,5]) / 5.0 array([ 0.2, 0.4, 0.6, 0.8, 1. ])

NumPy differentiates between integer vectors (called arrays) and real vectors. You can also specify the type directly:

>>> array([1,2,3,4,5], dtype='f') / 5 array([0.2, 0.4, 0.6, 0.8, 1.], dtype=float32)

We can do dot products:

>>> dot(array([1,2,3,4,5]), array([2,3,4,5,6])) 70

And matrix operations:

>>> array([[1,2,3],[4,5,6],[7,8,9]]) array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> array([[1,2,3],[4,5,6],[7,8,9]]).T array([[1, 4, 7], [2, 5, 8], [3, 6, 9]]) >>> array([[1,2,3],[4,5,6],[7,8,9]]) * array([1,10,20]) array([[ 1, 20, 60], [ 4, 50, 120], [ 7, 80, 180]]) >>> dot(array([[1,2,3],[4,5,6],[7,8,9]]), array([1,10,20])) array([ 81, 174, 267])

Here, .T means "transpose." Note that * is interpreted as point-wise multiplication and that dot is required to get a matrix/vector product.

Indexing is straightforward:

>>> x = array([[1,2,3,4],[5,6,7,8]]) >>> x array([[1, 2, 3, 4], [5, 6, 7, 8]]) >>> x[1,1] 6 >>> x[0,3] 4

NumPy supports slicing operations that are incredibly useful for ML applications. We can extract rows and columns in their entirety:

>>> x[0,:] array([1, 2, 3, 4]) >>> x[:,0] array([1, 5]) >>> x[:,0:2] array([[1, 2], [5, 6]])

You can use arrays to index into other arrays. For instance, perhaps we want to extract all values of x that are greater than 5 and maybe sum them up:

>>> x>5 array([[False, False, False, False], [False, True, True, True]], dtype=bool) >>> x[x>5] array([6, 7, 8]) >>> sum(x[x>5]) 21 >>> (x>2) & (x<7) array([[False, False, True, True], [ True, True, False, False]], dtype=bool) >>> x[(x>2) & (x<7)] array([3, 4, 5, 6])
You can even do assignment within slices:

>>> x array([[1, 2, 3, 4], [5, 6, 7, 8]]) >>> x[x>5] array([6, 7, 8]) >>> x[x>5] = 5 >>> x array([[1, 2, 3, 4], [5, 5, 5, 5]])

matplotlib Basics

In order to test matplotlib, let's try their default example:

>>> from pylab import randn, hist >>> x = randn(10000) >>> hist(x, 100) >>> show()
This should pop up a histogram showing something that looks approximately Gaussian. The randn function is generating 10k random values from a standard normal and hist is generating the histogram.

>>> plot(x,sin(x/50*math.pi),'b-', x,cos(x/50*math.pi),'r--'); >>> legend( ('sin','cos') ) >>> show()

Troubleshooting

These are some problems (and their solutions) that new python learners commonly encounter.

Problem:
ImportError: No module named py

Solution:
When using import, do not include the ".py" from the filename.
For example, you should say: import shop
NOT: import shop.py
Problem:
NameError: name 'MY VARIABLE' is not defined
Even after importing you may see this.

Solution:
To access a member of a module, you have to type MODULE_NAME.MEMBER_NAME, where MODULE_NAME is the name of the .py file, and MEMBER_NAME is the name of the variable (or function) you are trying to access.
Problem:
TypeError: 'dict' object is not callable

Solution:
Dictionary looks up are done using square brackets: [ and ]. NOT parenthesis: ( and ).
Problem:
ValueError: too many values to unpack

Solution:
Make sure the number of variables you are assigning in a for loop matches the number of elements in each item of the list. Similarly for working with tuples.
For example, if pair is a tuple of two elements (e.g. pair =('apple', 2.0)) then the following code would cause the "too many values to unpack error":
(a,b,c) = pair

Here is a problematic scenario involving a for loop:
```
pairList = [('apples', 2.00), ('oranges', 1.50), ('pears', 1.75)]
for fruit, price, color in pairList:
    print '%s fruit costs %f and is the color %s' % (fruit, price, color)
```
Problem:
AttributeError: 'list' object has no attribute 'length' (or something similar)

Solution:
Finding length of lists is done using len(NAME OF LIST).
Problem:
Changes to a file are not taking effect.

Solution:
1. Make sure you are saving all your files after any changes.
2. If you are editing a file in a window different from the one you are using to execute python, make sure you reload(YOUR_MODULE) to guarantee your changes are being reflected. reload works similar to import.

Brief Overview of Python and NumPy

Table of Contents

Dictionaries (Dicts)