Friday, May 20, 2016

Sorting in NumPy

If you are into Python programming and are manipulating data, you should learn NumPy. NumPy  is an extension to the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large library of high-level mathematical functions to operate on these arrays.

To use NumPy in your Python code, import the numpy module:

import numpy as np

To see how NumPy is useful, consider the following example:

persons = np.array(['Johnny','Mary','Peter','Will','Joe'])
ages = np.array([34,12,37,5,13])
heights = np.array([1.76,1.2,1.68,0.5,1.25])

In the above code snippet, I created three NumPy arrays to store a list of persons’ name, as well as their corresponding age and height. NumPy arrays are similar to Python’s list, except that NumPy arrays contains elements of the same type. Let’s print them out and example their contents:

print '---Before sorting---'
print persons
print ages
print heights

You should see the following:

---Before sorting---
['Johnny' 'Mary' 'Peter' 'Will' 'Joe']
[34 12 37  5 13]
[ 1.76  1.2   1.68  0.5   1.25]

Suppose we want to sort the elements based on certain axis, such as ages. After the sort, you would want to print the sorted ages as well as the corresponding name and height. For this purpose, you can use the argsort() function in NumPy:

sort_indices = np.argsort(ages)  # performs a sort based on ages
                                 # and returns an array of indices
                                 # indicating the sort order

The argsort() function returns a list of indices, which you can examine by printing it out:

print '---Sort indices---'
print sort_indices

You should see the following:

---Sort indices---
[3 1 4 0 2]

To print the all the other arrays sorted by age, you can now pass the sort indices into the individual arrays, like this:

print '---After sorting---'
print persons[sort_indices]
print ages[sort_indices]
print heights[sort_indices]

This will print out the following:

---After sorting---
['Will' 'Mary' 'Joe' 'Johnny' 'Peter']
[ 5 12 13 34 37]
[ 0.5   1.2   1.25  1.76  1.68]

The argsort() function also works for strings, like this:

sort_indices = np.argsort(persons)
print persons[sort_indices]
print ages[sort_indices]
print heights[sort_indices]
'''
prints out the following:
['Joe' 'Johnny' 'Mary' 'Peter' 'Will']
[13 34 12 37  5]
[ 1.25  1.76  1.2   1.68  0.5 ]
'''

Sorting in Reverse Order

The argsort() function only sorts in ascending order. So what happens if you need to sort in descending order? Well, easy! In Python, [::-1] reverses the order of a list. This applies to NumPy arrays too.

Hence, to sort the persons’ names in descending order, first sort it in ascending order and then reverse the result, like this:

reverse_sort_indices = np.argsort(persons)[::-1]
print persons[reverse_sort_indices]
print ages[reverse_sort_indices]
print heights[reverse_sort_indices]

The above code snippet prints out the following:

['Will' 'Peter' 'Mary' 'Johnny' 'Joe']
[ 5 37 12 34 13]

[ 0.5   1.68  1.2   1.76  1.25]

Learning More

This article is just touching on the surface of what Python can do in the world of data analytics. To learn more about using Python for data analysis, come join my workshop (Introduction to Data Science using Python) at NDC Sydney 2016 on the 1-2 August 2016. See you there!

No comments: