Showing posts with label data science. Show all posts
Showing posts with label data science. Show all posts

Thursday, December 01, 2016

IOT202 - Introduction to Data Science using Jupyter Notebook (Anaconda)

"Without data you're just another person with an opinion"
W. Edwards Deming, Data Scientist

Learn how to visualise and analyse your data using Python and its associated libraries. In this course, you will learn how to use Jupyter Notebook from the Anaconda package, and learn how to:

  • Visualise data using matplotlib
  • Clean data sets using NumPy
  • Perform Data Analytics using Pandas

IOT202 - Introduction to Data Science using Python
Course Fee
S$1297 (nett; no GST)
If your company is sponsoring you for the training, your company can enjoy 400% tax deductions/ allowances and/or 40% cash payout for investment in innovation and productivity improvements under the Productivity and Innovation Credit (PIC) scheme. For more details, check out the Productivity and Innovation Credit page. 
Schedules
Start DateEnd DateCourse Outline and Application FormCategory
Mon Dec 05 2016 CONFIRMEDTue Dec 06 2016
Thu Feb 16 2017Fri Feb 17 2017
Thu Mar 23 2017Fri Mar 24 2017
Venue
Hotel Grand Pacific Singapore
101 Victoria Street
Singapore 188018

If your company requires in-house training, you can contact us to customize the topics to meet your training requirements. We train worldwide! We have conducted customized classes in the United States, Canada, Norway, Denmark, Japan, China, Hong Kong, Taiwan, and Thailand.

Monday, November 21, 2016

Course Updates

Courses Confirmed for December 2016

Here is the list of courses that are confirmed for December 2016:


  • IOT202 - Introduction to Data Science using Python

  • AND101 - Fundamentals of Android Programming using Android Studio

  • IOS101 - Fundamentals of iPhone Programming using Swift



  • IOT101 - Programming Internet of Things (IoT) using Raspberry Pi


  • IOT104 - Programming Internet of Things (IoT) using Arduino

  • WEB202 - Implementing iOS and Android Push Notifications


Changes to Course Duration and Content

MOB104 - Writing Cross Platform iOS and Android Apps using Xamarin.Forms and C#


MOB104 is now a 2-day course. Due to the focus of Microsoft on the cross-platform developers' tools, Xamarin.Forms is now more important than ever. In this course, you will learn how to develop cross-platform mobile apps for Android and iOS, as well as learn some advanced techniques like:

  • Cross-platform User Interface
  • Dependency Injection
  • MVVM Architecture

IOS302 - Advanced iOS - Apple Watch Programming


With the latest watchOS from Apple, the Apple Watch is now a much more compelling wearable device. Apps now launch quicker, and battery life is much improved on the new Apple Watch series 2. We have since revised this course to 2 days, so that we have time to cover all the cool new features in watchOS 3. (Course comes with an Apple Watch Series 2 watch)

IOT201 - Python Programming


IOT201 is now titled "Python Programming". Instead of focusing just on the syntax of Python, we now cover doing interesting things with Python, such as:

  • Developing RESTful service using Flask
  • Securing your RESTful service using a SSL and Basic Authentication
  • Sockets programming using Python
  • Push notifications using Python
  • Writing your own modules

As such, the IOT201 is now a 2-day course.

Tuesday, November 08, 2016

Instructor-led training for your company and partners

Are you looking for instructor-led developer training for your company? If you are, then you are at the right place. I have travelled around the world, conducting trainings for companies in areas such as :

  • iOS (Objective-C and Swift)
  • Android
  • Bluetooth Low Energy
  • Xamarin 
  • IoT (Raspberry Pi and Arduino)
  • Node.js
  • Amazon Web Services
  • Python and Data Science

In-house training is most cost-effective for companies with a class size as small as 5. My hands-on training focuses on learning-by-doing and is code-intensive. Participants are expected to code and get their hands dirty. This is the most effective way to learn a new technology.

If you are a training provider and would like to bring my trainings to your partners, contact me at weimenglee@learn2develop.net to start the conversation now.

Photo by Scott Kvitberg Photography

Wednesday, July 27, 2016

Code Magazine - Introduction to IoT Using the Raspberry Pi

My latest article (co-authored with Clarence Chng) for Code Magazine is online now - http://www.codemag.com/Article/1607071.

This article introduces readers to what is Internet of Things (IoT), using the Raspberry Pi. Have fun and enjoy!

Saturday, May 28, 2016

Learning Data Science using Python

Come and join us in our first Data Science series of courses:

IOT201 - Learning the Python Programming Language
IOT202 - Introduction to Data Science using Python


In IOT201, you will learn about the Python Programming language, and the various features that makes it one of the most popular programming languages for beginners as well as advanced programmers.

Once you have mastered the basics of Python, head over to IOT202, where you will learn the various modules and libraries that allow Python to crunch large amount of data. In particular, you will learn:

  • NumPy - an extension to the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large library of high-level mathematical functions to operate on these arrays.
  • Pandas - a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.
  • Matplotlib - a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. 

Once you have learned the basics of NumPy, Pandas, Matplotlib, it is time to make use of them to analyse your data. In particular, you will analyse:

  • Birth Rates in Singapore 
  • Different types of infectious diseases affecting Singapore 
  • Prevalence of diseases in adults in Singapore 
  • Historical trends of stocks
  • And more!

Friday, May 20, 2016

Sorting in NumPy

If you are into Python programming and are manipulating data, you should learn NumPy. NumPy  is an extension to the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large library of high-level mathematical functions to operate on these arrays.

To use NumPy in your Python code, import the numpy module:

import numpy as np

To see how NumPy is useful, consider the following example:

persons = np.array(['Johnny','Mary','Peter','Will','Joe'])
ages = np.array([34,12,37,5,13])
heights = np.array([1.76,1.2,1.68,0.5,1.25])

In the above code snippet, I created three NumPy arrays to store a list of persons’ name, as well as their corresponding age and height. NumPy arrays are similar to Python’s list, except that NumPy arrays contains elements of the same type. Let’s print them out and example their contents:

print '---Before sorting---'
print persons
print ages
print heights

You should see the following:

---Before sorting---
['Johnny' 'Mary' 'Peter' 'Will' 'Joe']
[34 12 37  5 13]
[ 1.76  1.2   1.68  0.5   1.25]

Suppose we want to sort the elements based on certain axis, such as ages. After the sort, you would want to print the sorted ages as well as the corresponding name and height. For this purpose, you can use the argsort() function in NumPy:

sort_indices = np.argsort(ages)  # performs a sort based on ages
                                 # and returns an array of indices
                                 # indicating the sort order

The argsort() function returns a list of indices, which you can examine by printing it out:

print '---Sort indices---'
print sort_indices

You should see the following:

---Sort indices---
[3 1 4 0 2]

To print the all the other arrays sorted by age, you can now pass the sort indices into the individual arrays, like this:

print '---After sorting---'
print persons[sort_indices]
print ages[sort_indices]
print heights[sort_indices]

This will print out the following:

---After sorting---
['Will' 'Mary' 'Joe' 'Johnny' 'Peter']
[ 5 12 13 34 37]
[ 0.5   1.2   1.25  1.76  1.68]

The argsort() function also works for strings, like this:

sort_indices = np.argsort(persons)
print persons[sort_indices]
print ages[sort_indices]
print heights[sort_indices]
'''
prints out the following:
['Joe' 'Johnny' 'Mary' 'Peter' 'Will']
[13 34 12 37  5]
[ 1.25  1.76  1.2   1.68  0.5 ]
'''

Sorting in Reverse Order

The argsort() function only sorts in ascending order. So what happens if you need to sort in descending order? Well, easy! In Python, [::-1] reverses the order of a list. This applies to NumPy arrays too.

Hence, to sort the persons’ names in descending order, first sort it in ascending order and then reverse the result, like this:

reverse_sort_indices = np.argsort(persons)[::-1]
print persons[reverse_sort_indices]
print ages[reverse_sort_indices]
print heights[reverse_sort_indices]

The above code snippet prints out the following:

['Will' 'Peter' 'Mary' 'Johnny' 'Joe']
[ 5 37 12 34 13]

[ 0.5   1.68  1.2   1.76  1.25]

Learning More

This article is just touching on the surface of what Python can do in the world of data analytics. To learn more about using Python for data analysis, come join my workshop (Introduction to Data Science using Python) at NDC Sydney 2016 on the 1-2 August 2016. See you there!

Wednesday, May 18, 2016

Data Visualization Using Python, Pandas and Matplotlib

Internet of Things (IoT) has been the buzzwords of late. While most people associate IoT with the collection of data using sensors and transmitted to central servers, an integral part of IoT involves processing the data collected. The ability to visualize data and make intelligent decisions is the cornerstone of IoT systems.

Python is one of the preferred languages for data analytics, due to its ease of learning and its huge community support of modules and packages designed for number crunching. In this article, I am going to show you the power of Python and how you can use it to visualize data.

Collection of Blood Glucose Data

With the advancement in technologies, heathcare is one area that is receiving a lot of attention. One particular disease – diabetes, garners a lot of attention. According to the World Health Organization (WHO), the number of people with diabetes has risen from 108 million in 1980 to 422 million in 2014. The care and prevention of diabetes is hence of paramount importance. Diabetics need to regular prick their fingers to measure the amount of blood sugar in their body.

For this article, I am going to show you how to visualize the data collected by a diabetic so that he can see at a glance on how well he is keeping diabetes in control.

Storing the Data

For this article, I am assuming that you have a CSV file named readings.csv, which contains the following lines:

,DateTime,mmol/L
0,2016-06-01 08:00:00,6.1
1,2016-06-01 12:00:00,6.5
2,2016-06-01 18:00:00,6.7
3,2016-06-02 08:00:00,5.0
4,2016-06-02 12:00:00,4.9
5,2016-06-02 18:00:00,5.5
6,2016-06-03 08:00:00,5.6
7,2016-06-03 12:00:00,7.1
8,2016-06-03 18:00:00,5.9
9,2016-06-04 09:00:00,6.6
10,2016-06-04 11:00:00,4.1
11,2016-06-04 17:00:00,5.9
12,2016-06-05 08:00:00,7.6
13,2016-06-05 12:00:00,5.1
14,2016-06-05 18:00:00,6.9
15,2016-06-06 08:00:00,5.0
16,2016-06-06 12:00:00,6.1
17,2016-06-06 18:00:00,4.9
18,2016-06-07 08:00:00,6.6
19,2016-06-07 12:00:00,4.1
20,2016-06-07 18:00:00,6.9
21,2016-06-08 08:00:00,5.6
22,2016-06-08 12:00:00,8.1
23,2016-06-08 18:00:00,10.9
24,2016-06-09 08:00:00,5.2
25,2016-06-09 12:00:00,7.1
26,2016-06-09 18:00:00,4.9

The CSV file contains rows of data that are divided into three columns – index, date and time, and blood glucose readings in mmol/L.

Reading the Data in Python

While Python supports lists and dictionaries for manipulating structured data, it is not well suited for manipulating numerical tables, such as the one stored in the CSV file. As such, you should use pandas. Pandas is a software library written for Python for data manipulation and analysis.

Let’s see how pandas work. Note that for this article, I am using IPython Notebook for running my Python script. The best way to use IPython Notebook is to download Anaconda (https://www.continuum.io/downloads). Anaconda comes with the IPython Notebook, as well as pandas and matplotlib (more on this later).

Once Anaconda is installed, launch the IPython Notebook by typing the following command in Terminal:

$ ipython notebook

When IPython Notebook has started, click on New | Python 2:




Type the following statements into the cell:

import pandas as pd
data_frame = pd.read_csv('readings.csv', index_col=0, parse_dates=[1])
print data_frame

You first import the pandas module as pd, then you use the read_csv() function read the data from the CSV file to create a dataframe. A dataframe in pandas behaves like a two-dimensional array, with an index for each row. The index_col parameter specifies which column in the CSV file will be used as the index (column 0 in this case) and the parse_dates parameter specifies the column that should be parsed as a datetime object (column 1 in this case). To run the Python script in the cell, press Ctrl-Enter.

When you print out the dataframe, you should see the following:

              DateTime  mmol/L
0  2016-06-01 08:00:00     6.1
1  2016-06-01 12:00:00     6.5
2  2016-06-01 18:00:00     6.7
3  2016-06-02 08:00:00     5.0
4  2016-06-02 12:00:00     4.9
5  2016-06-02 18:00:00     5.5
6  2016-06-03 08:00:00     5.6
7  2016-06-03 12:00:00     7.1
8  2016-06-03 18:00:00     5.9
9  2016-06-04 09:00:00     6.6
10 2016-06-04 11:00:00     4.1
11 2016-06-04 17:00:00     5.9
12 2016-06-05 08:00:00     7.6
13 2016-06-05 12:00:00     5.1
14 2016-06-05 18:00:00     6.9
15 2016-06-06 08:00:00     5.0
16 2016-06-06 12:00:00     6.1
17 2016-06-06 18:00:00     4.9
18 2016-06-07 08:00:00     6.6
19 2016-06-07 12:00:00     4.1
20 2016-06-07 18:00:00     6.9
21 2016-06-08 08:00:00     5.6
22 2016-06-08 12:00:00     8.1
23 2016-06-08 18:00:00    10.9
24 2016-06-09 08:00:00     5.2
25 2016-06-09 12:00:00     7.1
26 2016-06-09 18:00:00     4.9

You can print out the index of the dataframe by using the index property:

print data_frame.index

You should see the index as follows:

Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
            17, 18, 19, 20, 21, 22, 23, 24, 25, 26],
           dtype='int64')

You can also print out the individual columns of the dataframe:

print data_frame['DateTime']

This should print out the DateTime column of the dataframe:

0    2016-06-01 08:00:00
1    2016-06-01 12:00:00
2    2016-06-01 18:00:00
3    2016-06-02 08:00:00
4    2016-06-02 12:00:00
5    2016-06-02 18:00:00
6    2016-06-03 08:00:00
7    2016-06-03 12:00:00
8    2016-06-03 18:00:00
9    2016-06-04 09:00:00
10   2016-06-04 11:00:00
11   2016-06-04 17:00:00
12   2016-06-05 08:00:00
13   2016-06-05 12:00:00
14   2016-06-05 18:00:00
15   2016-06-06 08:00:00
16   2016-06-06 12:00:00
17   2016-06-06 18:00:00
18   2016-06-07 08:00:00
19   2016-06-07 12:00:00
20   2016-06-07 18:00:00
21   2016-06-08 08:00:00
22   2016-06-08 12:00:00
23   2016-06-08 18:00:00
24   2016-06-09 08:00:00
25   2016-06-09 12:00:00
26   2016-06-09 18:00:00
Name: DateTime, dtype: datetime64[ns]

Likewise, you can also print the mmol/L column:

print data_frame['mmol/L']

You should see the following:

0      6.1
1      6.5
2      6.7
3      5.0
4      4.9
5      5.5
6      5.6
7      7.1
8      5.9
9      6.6
10     4.1
11     5.9
12     7.6
13     5.1
14     6.9
15     5.0
16     6.1
17     4.9
18     6.6
19     4.1
20     6.9
21     5.6
22     8.1
23    10.9
24     5.2
25     7.1
26     4.9
Name: mmol/L, dtype: float64

Visualizing the Data

Let’s now try to visualize the data by displaying a chart. For this purpose, let’s use matplotlib. Matplotlib is a plotting library for the Python language and is integrated right into pandas.

Add the following statements in bold to the existing Python script:

%matplotlib inline

import pandas as pd
import numpy as np

data_frame = pd.read_csv('readings.csv', index_col=0, parse_dates=[1])
print data_frame
print data_frame.index
print data_frame['DateTime']
print data_frame['mmol/L']

data_frame.plot(x='DateTime', y='mmol/L')

The “%matplotlib inline” statement instructs IPython notebook to plot the matplotlib chart inline. You can directly plot a chart using the dataframe’s plot() function. The x parameter specifies the column to use for the x-axis and the y parameter specifies the column to use for the y-axis.

This will display the chart as follows:


  
You can add a title to the chart by importing the matplotlib module and using the title() function:

%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data_frame = pd.read_csv('readings.csv', index_col=0, parse_dates=[1])
print data_frame

print data_frame.index
print data_frame['DateTime']
print data_frame['mmol/L']

data_frame.plot(x='DateTime', y='mmol/L')
plt.title('Blood Glucose Readings for John', color='Red')

A title is now displayed for the chart:


By default, matplotlib will display a line chart. You can change the chart type by using the kind parameter:

%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data_frame = pd.read_csv('readings.csv', index_col=0, parse_dates=[1])
print data_frame

print data_frame.index
print data_frame['DateTime']
print data_frame['mmol/L']

data_frame.plot(kind='bar', x='DateTime', y='mmol/L')
plt.title('Blood Glucose Readings for John', color='Red')

The chart is now changed to a barchart:



Besides displaying as a barchart, you can also display an area chart:

data_frame.plot(kind='area', x='DateTime', y='mmol/L')

The chart is now displayed as an area chart:



You can also set the color for the area chart by using the color parameter:

data_frame.plot(kind='area', x='DateTime', y='mmol/L', color='r')

The area is now in red:


Learning More


This article is just touching on the surface of what Python can do in the world of data analytics. To learn more about using Python for data analysis, come join my workshop (Introduction to Data Science using Python) at NDC Sydney 2016 on the 1-2 August 2016. See you there!