If you are into Python programming and are manipulating data, you
should learn NumPy. NumPy is an
extension to the Python programming language, adding support for large,
multi-dimensional arrays and matrices, along with a large library of high-level
mathematical functions to operate on these arrays.
To use NumPy in your Python code, import the numpy module:
import numpy as np
To see how NumPy is useful, consider the following example:
persons = np.array(['Johnny','Mary','Peter','Will','Joe'])
ages = np.array([34,12,37,5,13])
heights = np.array([1.76,1.2,1.68,0.5,1.25])
In the above code snippet, I created three NumPy arrays to store
a list of persons’ name, as well as their corresponding age and height. NumPy
arrays are similar to Python’s list, except that NumPy arrays contains elements
of the same type. Let’s print them out and example their contents:
print '---Before sorting---'
print persons
print ages
print heights
You should see the following:
---Before sorting---
['Johnny' 'Mary' 'Peter' 'Will' 'Joe']
[34 12 37 5 13]
[ 1.76 1.2 1.68
0.5 1.25]
Suppose we want to sort the elements based on certain axis, such
as ages. After the sort, you would want to print the sorted ages as well as the
corresponding name and height. For this purpose, you can use the argsort() function in NumPy:
sort_indices = np.argsort(ages)
# performs a sort based on ages
# and returns an array of indices
# indicating the sort order
The argsort() function
returns a list of indices, which you can examine by printing it out:
print '---Sort indices---'
print sort_indices
You should see the following:
---Sort indices---
[3 1 4 0 2]
To print the all the other arrays sorted by age, you can now pass
the sort indices into the individual arrays, like this:
print '---After sorting---'
print persons[sort_indices]
print ages[sort_indices]
print heights[sort_indices]
This will print out the following:
---After sorting---
['Will' 'Mary' 'Joe' 'Johnny' 'Peter']
[ 5 12 13 34 37]
[ 0.5 1.2 1.25
1.76 1.68]
The argsort() function
also works for strings, like this:
sort_indices = np.argsort(persons)
print persons[sort_indices]
print ages[sort_indices]
print heights[sort_indices]
'''
prints out the following:
['Joe' 'Johnny' 'Mary' 'Peter' 'Will']
[13 34 12 37 5]
[ 1.25 1.76 1.2
1.68 0.5 ]
'''
Sorting in Reverse Order
The argsort() function only sorts in ascending order. So what happens if you need to sort in descending
order? Well, easy! In Python, [::-1] reverses the order of a list. This applies
to NumPy arrays too.
Hence, to sort the persons’ names in descending order, first sort
it in ascending order and then reverse the result, like this:
reverse_sort_indices
= np.argsort(persons)[::-1]
print
persons[reverse_sort_indices]
print
ages[reverse_sort_indices]
print
heights[reverse_sort_indices]
The above code snippet prints out the following:
['Will' 'Peter'
'Mary' 'Johnny' 'Joe']
[ 5 37 12 34 13]
[ 0.5 1.68
1.2 1.76 1.25]
Learning More
This article is just touching on the surface of what Python can do in the world of data analytics. To learn more about using Python for data analysis, come join my workshop (Introduction to Data Science using Python) at NDC Sydney 2016 on the 1-2 August 2016. See you there!
No comments:
Post a Comment