Panda's

Dumb Start with Panda’s day 2

Today I’m gonna  brush upon the concepts of Lambda Functions in an overview. Follow up of ipython notebook for lambda functions. Lambda functions allow us to inline the definitions inside the map functions. Lambda functions provides the flexibility to provide within the map function along with filter to extract more abstract data.

for example lets take a range of 20 numbers we wanted to return the numbers which are divisible by 2

You could do normally with an iteration like

seq = range(20)
[num**2 for num in seq if num %2 == 0]

If you would like to utilize map, filter it could be like


?map #To know about the documentation

?filter

map(lambda x: x**2, lambda x: x %2 == 0, seq) 

For reference checkout the ipython notebook in the link above.

This finishes the tutorial of Ipython and Raw Python Data Analysis.

I’m going to start the Numpy and will be covering following topics:

  • HelpFul Methods and Shortcuts
  • Vectorization
  • Multi-Dimensional Arrays
  • Boolean Selection
  • Querying, Slicing and Splitting Arrays.

Numpy is the fundamental base for scientific computing. It is the base package for several machine learning tools like pandas, matlplot, sci-kit etc.,

In the Ipython shell if you would like to learn more about the numpy arrays. Please checkout the numpy documentaion


?numpy

In short it helps, to perform random number generation, fast calculation, Linear Algebra , Fourier transforms and a lot of complex mathematical computation. More Documentation is available in the numpy website. For more ipython notes check this link

Lets take a loot at Boolean Selection , it helps us to filter certain values just like filter in list comprehension. For example in traditional shell, if we wanted to get the list of numbers divisible by 2 out of 20. we give like:


seq = arange(20)

[x for x in seq if x % 2 == 0]

But in a numpy array you could simple apply the condition


import sys

import numpy as np

npa = np.arange(20)

npa[npa % 2 == 0]

 

Also the cool thing about ipython shell is you could majic functions to display inline graphs, benchmarking the time execution of certain values.


%timeit

#Above is a majic function to measure the time of the execution.

Lets take the sample divisibility example to take a look at the timeit


npa = np.arange(20)

%timeit [x for x in npa if x % 2 == 0]
<p class="p1"><span class="s1">#Result :The slowest run took 8.50 times longer than the fastest. This could mean that an #intermediate result is being cached</span></p>
<p class="p1"><span class="s1">100000 loops, best of 3: 10.2 µs per loop</span></p>
<p class="p1"># Result HAHA This is because I didn't turn off my machine form a week almost and lots of load</p>
<p class="p1">#on processor.</p>

The sample implementation if  you use it with boolean selection results are way #too faster.


%timeit npa[npa % 2 == 0]
<p class="p1"><span class="s1">#Result: 1000 loops, best of 3: 405 µs per loop</span></p>
Panda's

Dumb Start with Panda’s day 1

Great year to start with learning panda’s one of the most machine learning libraries in python and will focus abstractly on sci-kit learning tools.

I started learning about python while doing my internship at intel andI was asked to develop an application which allows collection highly sophisticated labs. For installing packages, I used pip(python installation package tools). But to fast the process in learning panda’s, I’m going to use anaconda(collection of python packages) which enables data analysis and library management.

It doesn’t require root access of the computer!!!

thug-cat

Anyways! if you would like to go without anaconda, I think you could you use pip but make sure you have virtualenvironment installed in the background and doesn’t effect the your original filesystem.

Installation of Anacondas:http://docs.continuum.io/anaconda/install

Today I will skim through the following

IPYTHON and Raw python data analysis :-

  1. Ipython and the Ipython notebook
  2. Maps
  3. Filters
  4. List Comprehension
  5. Lambda Function

Next thing, Launch Anaconda App, and click on ipython-notebook which will automatically launch the terminal and start the development server for you with jupyter running in the browser as the localhost

Screen Shot 2016-01-08 at 2.04.47 PM

Basically Ipython console is divided in the following format such as cells, In each cell I could execute code lines and get the output right below the cells. For description purposes I could use the markdown text for giving any kind of description.

Screen Shot 2016-01-08 at 2.20.55 PM

Very quickly here are the key features which could useful in future such as Tab completion, Shift+Enter to execute code and start a new cell works very similar to a normal python interpreter. For playing around the more magic functions of python, use

Ip[]: %quickref

Also one more great use I observed is benchmarking tools such as How many nanoseconds does it take to run a particular function with the use of timeit.

Ip[]: %timeit square(x)

Spoiler Alert! Make sure if you change the variable values in the notebook, remember that it is defined as the global variable. For example in the top of the notebook if you have defined it as x = 5 and after some time if you define x = 4 it will initialize to 4 value.

  1. Intro:-

https://github.com/praneethkumarpidugu/Pandas/blob/master/day%201%20Intro%20to%20Ipython%20and%20Raw%20Data%20Analysis/Intro.ipynb

2.Maps

https://github.com/praneethkumarpidugu/Pandas/blob/master/day%201%20Intro%20to%20Ipython%20and%20Raw%20Data%20Analysis/Maps.ipynb

3.Filters

https://github.com/praneethkumarpidugu/Pandas/blob/master/day%201%20Intro%20to%20Ipython%20and%20Raw%20Data%20Analysis/filters.ipynb

4.List Comprehensions :-

https://github.com/praneethkumarpidugu/Pandas/blob/master/day%201%20Intro%20to%20Ipython%20and%20Raw%20Data%20Analysis/List_comprehensions.ipynb