Wednesday, May 18, 2016

Data Visualization Using Python, Pandas and Matplotlib

Internet of Things (IoT) has been the buzzwords of late. While most people associate IoT with the collection of data using sensors and transmitted to central servers, an integral part of IoT involves processing the data collected. The ability to visualize data and make intelligent decisions is the cornerstone of IoT systems.

Python is one of the preferred languages for data analytics, due to its ease of learning and its huge community support of modules and packages designed for number crunching. In this article, I am going to show you the power of Python and how you can use it to visualize data.

Collection of Blood Glucose Data

With the advancement in technologies, heathcare is one area that is receiving a lot of attention. One particular disease – diabetes, garners a lot of attention. According to the World Health Organization (WHO), the number of people with diabetes has risen from 108 million in 1980 to 422 million in 2014. The care and prevention of diabetes is hence of paramount importance. Diabetics need to regular prick their fingers to measure the amount of blood sugar in their body.

For this article, I am going to show you how to visualize the data collected by a diabetic so that he can see at a glance on how well he is keeping diabetes in control.

Storing the Data

For this article, I am assuming that you have a CSV file named readings.csv, which contains the following lines:

,DateTime,mmol/L
0,2016-06-01 08:00:00,6.1
1,2016-06-01 12:00:00,6.5
2,2016-06-01 18:00:00,6.7
3,2016-06-02 08:00:00,5.0
4,2016-06-02 12:00:00,4.9
5,2016-06-02 18:00:00,5.5
6,2016-06-03 08:00:00,5.6
7,2016-06-03 12:00:00,7.1
8,2016-06-03 18:00:00,5.9
9,2016-06-04 09:00:00,6.6
10,2016-06-04 11:00:00,4.1
11,2016-06-04 17:00:00,5.9
12,2016-06-05 08:00:00,7.6
13,2016-06-05 12:00:00,5.1
14,2016-06-05 18:00:00,6.9
15,2016-06-06 08:00:00,5.0
16,2016-06-06 12:00:00,6.1
17,2016-06-06 18:00:00,4.9
18,2016-06-07 08:00:00,6.6
19,2016-06-07 12:00:00,4.1
20,2016-06-07 18:00:00,6.9
21,2016-06-08 08:00:00,5.6
22,2016-06-08 12:00:00,8.1
23,2016-06-08 18:00:00,10.9
24,2016-06-09 08:00:00,5.2
25,2016-06-09 12:00:00,7.1
26,2016-06-09 18:00:00,4.9

The CSV file contains rows of data that are divided into three columns – index, date and time, and blood glucose readings in mmol/L.

Reading the Data in Python

While Python supports lists and dictionaries for manipulating structured data, it is not well suited for manipulating numerical tables, such as the one stored in the CSV file. As such, you should use pandas. Pandas is a software library written for Python for data manipulation and analysis.

Let’s see how pandas work. Note that for this article, I am using IPython Notebook for running my Python script. The best way to use IPython Notebook is to download Anaconda (https://www.continuum.io/downloads). Anaconda comes with the IPython Notebook, as well as pandas and matplotlib (more on this later).

Once Anaconda is installed, launch the IPython Notebook by typing the following command in Terminal:

$ ipython notebook

When IPython Notebook has started, click on New | Python 2:




Type the following statements into the cell:

import pandas as pd
data_frame = pd.read_csv('readings.csv', index_col=0, parse_dates=[1])
print data_frame

You first import the pandas module as pd, then you use the read_csv() function read the data from the CSV file to create a dataframe. A dataframe in pandas behaves like a two-dimensional array, with an index for each row. The index_col parameter specifies which column in the CSV file will be used as the index (column 0 in this case) and the parse_dates parameter specifies the column that should be parsed as a datetime object (column 1 in this case). To run the Python script in the cell, press Ctrl-Enter.

When you print out the dataframe, you should see the following:

              DateTime  mmol/L
0  2016-06-01 08:00:00     6.1
1  2016-06-01 12:00:00     6.5
2  2016-06-01 18:00:00     6.7
3  2016-06-02 08:00:00     5.0
4  2016-06-02 12:00:00     4.9
5  2016-06-02 18:00:00     5.5
6  2016-06-03 08:00:00     5.6
7  2016-06-03 12:00:00     7.1
8  2016-06-03 18:00:00     5.9
9  2016-06-04 09:00:00     6.6
10 2016-06-04 11:00:00     4.1
11 2016-06-04 17:00:00     5.9
12 2016-06-05 08:00:00     7.6
13 2016-06-05 12:00:00     5.1
14 2016-06-05 18:00:00     6.9
15 2016-06-06 08:00:00     5.0
16 2016-06-06 12:00:00     6.1
17 2016-06-06 18:00:00     4.9
18 2016-06-07 08:00:00     6.6
19 2016-06-07 12:00:00     4.1
20 2016-06-07 18:00:00     6.9
21 2016-06-08 08:00:00     5.6
22 2016-06-08 12:00:00     8.1
23 2016-06-08 18:00:00    10.9
24 2016-06-09 08:00:00     5.2
25 2016-06-09 12:00:00     7.1
26 2016-06-09 18:00:00     4.9

You can print out the index of the dataframe by using the index property:

print data_frame.index

You should see the index as follows:

Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
            17, 18, 19, 20, 21, 22, 23, 24, 25, 26],
           dtype='int64')

You can also print out the individual columns of the dataframe:

print data_frame['DateTime']

This should print out the DateTime column of the dataframe:

0    2016-06-01 08:00:00
1    2016-06-01 12:00:00
2    2016-06-01 18:00:00
3    2016-06-02 08:00:00
4    2016-06-02 12:00:00
5    2016-06-02 18:00:00
6    2016-06-03 08:00:00
7    2016-06-03 12:00:00
8    2016-06-03 18:00:00
9    2016-06-04 09:00:00
10   2016-06-04 11:00:00
11   2016-06-04 17:00:00
12   2016-06-05 08:00:00
13   2016-06-05 12:00:00
14   2016-06-05 18:00:00
15   2016-06-06 08:00:00
16   2016-06-06 12:00:00
17   2016-06-06 18:00:00
18   2016-06-07 08:00:00
19   2016-06-07 12:00:00
20   2016-06-07 18:00:00
21   2016-06-08 08:00:00
22   2016-06-08 12:00:00
23   2016-06-08 18:00:00
24   2016-06-09 08:00:00
25   2016-06-09 12:00:00
26   2016-06-09 18:00:00
Name: DateTime, dtype: datetime64[ns]

Likewise, you can also print the mmol/L column:

print data_frame['mmol/L']

You should see the following:

0      6.1
1      6.5
2      6.7
3      5.0
4      4.9
5      5.5
6      5.6
7      7.1
8      5.9
9      6.6
10     4.1
11     5.9
12     7.6
13     5.1
14     6.9
15     5.0
16     6.1
17     4.9
18     6.6
19     4.1
20     6.9
21     5.6
22     8.1
23    10.9
24     5.2
25     7.1
26     4.9
Name: mmol/L, dtype: float64

Visualizing the Data

Let’s now try to visualize the data by displaying a chart. For this purpose, let’s use matplotlib. Matplotlib is a plotting library for the Python language and is integrated right into pandas.

Add the following statements in bold to the existing Python script:

%matplotlib inline

import pandas as pd
import numpy as np

data_frame = pd.read_csv('readings.csv', index_col=0, parse_dates=[1])
print data_frame
print data_frame.index
print data_frame['DateTime']
print data_frame['mmol/L']

data_frame.plot(x='DateTime', y='mmol/L')

The “%matplotlib inline” statement instructs IPython notebook to plot the matplotlib chart inline. You can directly plot a chart using the dataframe’s plot() function. The x parameter specifies the column to use for the x-axis and the y parameter specifies the column to use for the y-axis.

This will display the chart as follows:


  
You can add a title to the chart by importing the matplotlib module and using the title() function:

%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data_frame = pd.read_csv('readings.csv', index_col=0, parse_dates=[1])
print data_frame

print data_frame.index
print data_frame['DateTime']
print data_frame['mmol/L']

data_frame.plot(x='DateTime', y='mmol/L')
plt.title('Blood Glucose Readings for John', color='Red')

A title is now displayed for the chart:


By default, matplotlib will display a line chart. You can change the chart type by using the kind parameter:

%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data_frame = pd.read_csv('readings.csv', index_col=0, parse_dates=[1])
print data_frame

print data_frame.index
print data_frame['DateTime']
print data_frame['mmol/L']

data_frame.plot(kind='bar', x='DateTime', y='mmol/L')
plt.title('Blood Glucose Readings for John', color='Red')

The chart is now changed to a barchart:



Besides displaying as a barchart, you can also display an area chart:

data_frame.plot(kind='area', x='DateTime', y='mmol/L')

The chart is now displayed as an area chart:



You can also set the color for the area chart by using the color parameter:

data_frame.plot(kind='area', x='DateTime', y='mmol/L', color='r')

The area is now in red:


Learning More


This article is just touching on the surface of what Python can do in the world of data analytics. To learn more about using Python for data analysis, come join my workshop (Introduction to Data Science using Python) at NDC Sydney 2016 on the 1-2 August 2016. See you there!

Tuesday, May 03, 2016

IOT101 now comes with more sensors and a 7" Touch Display!

Past participants for the IOT101 course were deluged with the various sensors and equipment; it is now going to get worse! We are adding a large number of sensors as well as the official Raspberry Pi 7" Touch Screen Display (with 10 Finger Capacitive Touch) to the IOT Kit. That is to say, you are going to get a box of gadgets that are going to keep you busy for the next one week after the course!

What's more, we are using the latest Raspberry Pi 3, which comes with Bluetooth LE and WiFi built in.


* Items in the IoT Kit subject to change based on availability



Monday, May 02, 2016

Programming Smartwatches

Smartwatches have been around for quite sometime. While some do not live up to their hype, some are getting better with improvements made to the software and hardware.

We have two courses for those of you who are interested in smartwatch programming:

  • IOS302 - Advanced iOS - Apple Watch Programming
  • AND304 - Advanced Android - Android Wear Programming

For each course, participants will get either an Android Wear device, or an Apple Watch.

iOS Boot Camp

If your company is planning to go into iOS development, the 5-Day iOS Boot Camp is the most cost-effective way to get your developers jumpstarted. Available in Swift or Objective-C, this course focuses on all the important aspects of iOS development to jumpstart your developers in the shortest time.  We can conduct this course in house, or you can send your developers to our open classes.

Topics include:

  • Introduction to Objective-C or Swift
  • Storyboard
  • Location-Based Services
  • Design Patterns
  • Protocols and Delegates
  • Databases
  • Web Services
  • Background Fetch
  • Network Connectivity

We have conducted this course successfully worldwide. Contact Wei-Meng Lee @ weimenglee@learn2develop.net for details such as costing, venue, as well as in-house arrangements.

New Data Science Series in the IOT Suite of Courses

As I have mentioned many times, IoT (Internet of Things) is much more than collecting tons of data. It also involves analysing data to derive sense (forgive me for the pun) out of them and to use them to make intelligent decisions. Hence, we now have a new series of courses (more to be added) focusing on Data Science:

  • IOT201 - Learning the Python Programming Language
  • IOT202 - Introduction to Data Science using Python 

Python is such a useful language for manipulating data that we are making it the first language to learn in this series. It is also easy for beginners to pick up, and this makes learning Data Science much more palatable.

Learn iOS Programming using Swift

Learn how to program your iOS devices using Swift. In this course, you will learn the the fundamental building blocks of iOS programming:

  • Crash course in Swift
  • Using Storyboard
  • Views and View Controllers
  • Different types of applications
  • Location Based Services
  • Displaying Maps
  • File Storage
  • Background Fetch
  • SQLite Database

In addition, participants will also get 2 Swift cheat sheets updated to the latest version of Swift. These 2 cheat sheets are handy companions for every Swift developers! You can also download your own copy here:


 

Learn Android Programming using Android Studio 2

Learn Android Programming using the latest Android Studio 2. In this course you will learn all the fundamental building blocks of Android programming:
  • Activities
  • Intents
  • Broadcasts and Broadcast Receivers
  • Google Maps
  • Location Based Services
  • Databases
  • File Storage

At the end of the course, you would have the knowledge to build some cool and exciting Android Apps and test it on your real devices!


Friday, April 15, 2016

Changes in Swift 2.2 - Selectors

Prior to Swift 2.2, you can use a string literal for Objective-C selectors. For example, in the following code snippet you could specify onTimer as the selector:

    func onTimer() {
        ...
    }

    override func viewDidLoad() {
        super.viewDidLoad()
        
        delta = CGPointMake(12.0,4.0)
        NSTimer.scheduledTimerWithTimeInterval(0.05,
                                               target:self,
                                               selector:"onTimer",
                                               userInfo:nil,
                                               repeats:true)

    }

Because there is no checking to ensure that the selector name is a well-formed selector (or that it even refers to a valid method), this method of naming a selector is dangerous and hence deprecated in Swift 2.2. 

Instead, you should now use the new #selector expression that allows you to build a selector from a reference to a method. The above code now looks like this:

    override func viewDidLoad() {
        super.viewDidLoad()

        NSTimer.scheduledTimerWithTimeInterval(0.05,
                                               target:self,
                                               selector:#selector(ViewController.onTimer),
                                               userInfo:nil,
                                               repeats:true)
    }

If the target method accepts arguments, then the selector looks like this:

    func doSomething(num1:Int, num2:Int) {
        ...
    }

    override func viewDidLoad() {
        super.viewDidLoad()        
        
        NSTimer.scheduledTimerWithTimeInterval(0.05,
                                               target:self,
                                               selector:#selector(
                                               ViewController.doSomething(_:num2:)),
                                               userInfo:nil,
                                               repeats:true)
    }

While the old method of using string literal for selector is still supported in Swift 2.2, you should use the new syntax when updating your code.

Sunday, March 13, 2016

IOT201 - Learning the Python Programming Language - 5 April 2016

The year 2016 is the year of Python. With the growing importance and interest in Data Science, Python is increasing used as the language of choice to crunch big numbers. With the burgeoning third-party libraries, Python is well suited to be the ideal language for data scientist. Come and join us in this one-day course on Python programming, and learn the essence of Python to get ready for the path to Data Science!
IOT201 - Learning the Python Programming Language
Course Fee
S$799 (nett; no GST)
If your company is sponsoring you for the training, your company can enjoy 400% tax deductions/ allowances and/or 60% cash payout for investment in innovation and productivity improvements under the Productivity and Innovation Credit (PIC) scheme. For more details, check out the Productivity and Innovation Credit page. 
Schedules
Start DateEnd DateDetailsCategory
Tue Apr 05 2016Tue Apr 05 2016PDF
Venue
Bayview Hotel Singapore
30 Bencoolen Street
Singapore 189621 

Wednesday, March 09, 2016

IOT courses in March - IOT101 and IOT102

The IOT courses at the end of March are all confirmed. It's going to be another week of fun with electronics and programming, and of course possibilities! Let your imaginations soar with all the projects that you will embark on, and see how you can deploy them in your own environment!

We are getting ready to test all the Pis to make sure they are ready for the big day!

WEB301 - Developing and Deploying Web Apps using Amazon Web Services (AWS) - 16-17 June 2016

Thinking of hosting your own Web services and applications? Not sure what languages to learn for developing your backend services? Join us in this new course on AWS and Node.js. You will learn the key services of AWS - EC2, ELB, RDS, S3, as well as how to develop backend services using Node.js.
WEB301 - Developing and Deploying Web Apps using Amazon Web Services (AWS)
Course Fee
S$1197 (nett; no GST)
If your company is sponsoring you for the training, your company can enjoy 400% tax deductions/ allowances and/or 60% cash payout for investment in innovation and productivity improvements under the Productivity and Innovation Credit (PIC) scheme. For more details, check out the Productivity and Innovation Credit page. 
Schedules
Start DateEnd DateDetailsCategory
Thu Jun 16 2016Fri Jun 17 2016PDF
Venue
Bayview Hotel Singapore
30 Bencoolen Street
Singapore 189621 

Monday, March 07, 2016

AND304 - Advanced Android - Android Wear Programming - 23 March 2016

For the upcoming Android Wear course, we will be using the LG Urbane Android Wear watch for testing and development (which you can keep at the end of the course).
AND304 - Advanced Android - Android Wear Programming
Course Fee
S$1297 (nett; no GST)
If your company is sponsoring you for the training, your company can enjoy 400% tax deductions/ allowances and/or 60% cash payout for investment in innovation and productivity improvements under the Productivity and Innovation Credit (PIC) scheme. For more details, check out the Productivity and Innovation Credit page. 
Schedules
Start DateEnd DateDetailsCategory
Wed Mar 23 2016 CONFIRMEDWed Mar 23 2016PDF
Venue
Bayview Hotel Singapore
30 Bencoolen Street
Singapore 189621