Generating data for Linear Regression using NumPy

We have already seen how to generate random numbers in previous article, here we will have a look at how to generate data in specific format for linear regression.

To test data for linear regression, we will need a data which has somewhat linear relationship and one set of random data.  Please find below code to generate data having some linear relation and random data using Python and NumPy. I have provided graphs which will help you understand the data created by using these programs.

Data with Linear Trend for Linear Regression

Data without any Trend for Linear Regression

You can use this as an input data while training your model.

Generating Random Numbers With NumPy

Many times we need some data for testing or we need some random numbers. NumPy can be very effective in generating random integers, floats or random values between 0 and 1. You can fetch truly random values as well as values in normal distribution as well.

Following program has multiple methods of creating random number for use in program

Brief History of Machine Learning

The term “Machine Learning” is coined by Arthur Samuel in 1959 while at IBM.

Brief History of ML

Date Details
1950 Alan Turing creates the “Turing Test” to determine if a computer has real intelligence. To pass the test, a computer must be able to fool a human into believing it is also human.
1950 Arthur Samuel wrote the first computer learning program. The program was the game of checkers, and the IBM computer improved at the game the more it played, studying which moves made up winning strategies and incorporating those moves into its program.
1957 Frank Rosenblatt designed the first neural network for computers (the perceptron)
1967 The “nearest neighbor” algorithm was written, allowing computers to begin using very basic pattern recognition. This could be used to map a route for traveling salesmen, starting at a random city but ensuring they visit all cities during a short tour.
1979 Students at Stanford University invent the “Stanford Cart” which can navigate obstacles in a room on its own.
1981 Gerald Dejong introduces the concept of Explanation Based Learning (EBL), in which a computer analyses training data and creates a general rule it can follow by discarding unimportant data.
1985 Terry Sejnowski invents NetTalk, which learns to pronounce words the same way a baby does.
1997 IBM’s Deep Blue beats the world champion at chess.
2006 Geoffrey Hinton coins the term “deep learning” to explain new algorithms that let computers “see” and distinguish objects and text in images and videos.
2008 DJ Patil and Jeff Hammerbacher coined the term “Data Scientist”
2011 IBM’s Watson beats its human competitors at Jeopardy.
2012 Google’s X Lab develops a machine learning algorithm that is able to autonomously browse YouTube videos to identify the videos that contain cats.
2014 Facebook FB develops DeepFace, a software algorithm that is able to recognize or verify individuals on photos to the same level as humans can.
2016 Google’s artificial intelligence algorithm beats a professional player at the Chinese board game Go, which is considered the world’s most complex board game and is many times harder than chess.

 

According to Michael I. Jordan, the ideas of machine learning, from methodological principles to theoretical tools, have had a long pre-history in statistics. He also suggested the term data science as a placeholder to call the overall field. You can refer to below, one the most famous venn diagram for Data Science.

How ML is different from AI ?

In the early days of AI, an increasing emphasis on the logical, knowledge-based approach caused a rift between AI and machine learning. By 1980, expert systems had come to dominate AI, and statistics was out of favor.

Machine learning, reorganized as a separate field, started to flourish in the 1990s. The field changed its goal from achieving artificial intelligence to tackling solvable problems of a practical nature. It shifted focus away from the symbolic approaches it had inherited from AI, and toward methods and models borrowed from statistics and probability theory.[11] It also benefited from the increasing availability of digitized information, and the ability to distribute it via the Internet.

Here is another famous venn diagram.

Hal Varian, Google’s chief economist, predicted in 2008 that the job of statistician will become the “sexiest” around. Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them. Data are becoming the new raw material of business: an economic input almost on a par with capital and labour.

Machine Learning is a peer-reviewed scientific journal, published since 1986

Further reading

  1. https://en.wikipedia.org/wiki/Machine_learning
  2. A Very Short History Of Data Science
  3. Data, data everywhere

Linear Regression Using Tensorflow

Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. You can read more about what is regression and type of regression and about linear regression 

Generally, you wont be using TensorFlow for problems like Linear regression, It can be best addressed by skitlearn, scipy libraries, however this is great starting point to understand TensorFlow.

Here is the code

################################################################################################
# name: TensorFlow_Linear_Regression_01.py
# desc: Linear Regression using TensorFlow
# date: 2019-02-03
# Author: conquistadorjd
################################################################################################
import tensorflow as tf
import numpy as np
from matplotlib import pyplot as plt

print('*** Program Started ***')
########## Input Data Creation
n = 20
x = np.arange(-n/2,n/2,1,dtype=np.float64)

m_real = np.random.uniform(0.8,0.9,(n,))
b_real = np.random.uniform(5,10,(n,))
print('m_real', type(m_real[0]))
y = x*m_real +b_real

########## Variables definition
m = tf.Variable(np.random.uniform(5,15,(1,)))
b = tf.Variable(np.random.uniform(5,15,(1,)))

########## display inout data and datatypes
print('x', x, type(x), type(x[0]))
print('y', y, type(y), type(y[0]))
print('m', m, type(m))
print('b', b, type(b))

########## Plot input to see the data
# plt.scatter(x,y,s=None, marker='o',color='g',edgecolors='g',alpha=0.9,label="Linear Relation")
# plt.grid(color='black', linestyle='--', linewidth=0.5,markevery=int)
# plt.legend(loc=2)
# plt.axis('scaled')
# plt.show()

########## Compute model and loss
model = tf.add(tf.multiply(x,m), b)
loss = tf.reduce_mean(tf.pow(model - y, 2))

########## Use following model if you get TypeError
# model = tf.add(tf.multiply(x, tf.cast(m, tf.float64)), tf.cast(b, tf.float64))
# loss = tf.reduce_mean(tf.pow(model - tf.cast(y, tf.float64), 2))
###########################################################################################

# Create optimizer
learn_rate = 0.01 # you can use 0.1/0.01/0.001 to test the output
num_epochs = 500 # Test output accuracy for different epochs
num_batches = n
optimizer = tf.train.GradientDescentOptimizer(learn_rate).minimize(loss)

########## Initialize variables
init = tf.global_variables_initializer()

########## Launch session
with tf.Session() as sess:
sess.run(init)
print('*** Initialize')

########## This is where training happens
for epoch in range(num_epochs):
for batch in range(num_batches):
sess.run(optimizer)

########## Display and plot results
print('m = ', sess.run(m))
print('b = ', sess.run(b))

x1 = np.linspace(-10,10,50)
y1 = sess.run(m)*x1+sess.run(b)

plt.scatter(x,y,s=None, marker='o',color='g',edgecolors='g',alpha=0.9,label="Linear Relation")
plt.grid(color='black', linestyle='--', linewidth=0.5,markevery=int)
plt.legend(loc=2)
plt.axis('scaled')

plt.plot(x1, y1, 'r')
plt.savefig('TensorFlow_Linear_Regression_01.png')
plt.show()

print('*** Program ended ***')

 

You can change the input and see the output. If you get NaN value in TensorFlow output, please change 0.01 to 0.001 in following line

optimizer = tf.train.GradientDescentOptimizer(0.01)

 

Here is the output

*** Program Started ***
m_real <class 'numpy.float64'>
x [-10.  -9.  -8.  -7.  -6.  -5.  -4.  -3.  -2.  -1.   0.   1.   2.   3.
   4.   5.   6.   7.   8.   9.] <class 'numpy.ndarray'> <class 'numpy.float64'>
y [-0.12267011  1.99923466 -1.82417449  3.70960816 -0.07838254  2.49865561
  6.01521568  4.72467689  4.26350466  6.29306134  6.56424532  6.37343995
  9.1530143   9.99292287 13.1932482   9.23547055 11.28963328 12.00597972
 14.64760425 14.58158682] <class 'numpy.ndarray'> <class 'numpy.float64'>
m <tf.Variable 'Variable:0' shape=(1,) dtype=float64_ref> <class 'tensorflow.python.ops.variables.RefVariable'>
b <tf.Variable 'Variable_1:0' shape=(1,) dtype=float64_ref> <class 'tensorflow.python.ops.variables.RefVariable'>
2019-02-03 16:10:20.898092: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
*** Initialize
m =  [0.79374898]
b =  [7.12266825]
*** Program ended ***

You can ignore the line “2019-02-03 16:10:20.898092: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA ” We will discuss about it later.

TensorFlow Tutorial : Basics

In TensorFlow, the term tensor refers to the representation of data as multi-dimensional array whereas the term flow refers to the series of operations that one performs on tensors.

In TensorFlow, computation is described using a sort of flowchart of operations, called as data flow graphs. Each node of the graph represents an instance of a mathematical operation (like addition, division, or multiplication) and each edge is a multi-dimensional data set (tensor) on which the operations are performed. The input goes in at one end, and then it flows through this system of multiple operations and comes out the other end as output.

A tensor is a vector or matrix of n-dimensions that represents all types of data. All values in a tensor hold identical data type with a known (or partially known) shape. The shape of the data is the dimensionality of the matrix or array. tensors are just multidimensional arrays, that allows you to represent data having higher dimensions. In general, Deep Learning you deal with high dimensional data sets where dimensions refer to different features present in the data set.

  • 0-d tensor: scalar (number)
  • 1-d tensor: vector
  • 2-d tensor: matrix

Constants

If you need constants with specific values inside your training model, then the constant object can be used

rate = tf.constant(15.2, name="rate", dtype=tf.float32)

Variables

Variables in TensorFlow are in-memory buffers containing tensors which have to be explicitly initialized and used in-graph to maintain state across session. By simply calling the constructor the variable is added in computational graph.

name = tf.Variable("techtrekking.com", name="name")

The graph is a set of computation that takes place successively. TensorFlow makes use of a graph framework. The graph gathers and describes all the series computations done during the training

Each operation is called an op node and are connected to each other.

A placeholder is TensorFlow’s way of allowing developers to inject data into the computation graph through placeholders which are bound inside some expressions. they allow developers to create operations, and the computational graph in general, without needing to provide the data in advance for that, and the data can be added in runtime from external sources.

distance = tf.placeholder(tf.float32, name="distance")

A Session object encapsulates the environment in which Operation objects are executed, and Tensor objects are evaluated.  In order to actually evaluate the nodes, we must run a computational graph within a session.

A session encapsulates the control and state of the TensorFlow runtime

Common functions

TensorFlow operator Description
tf.add x+y
tf.subtract x-y
tf.multiply x*y
tf.div x/y
tf.mod x % y
tf.abs |x|
tf.negative -x
tf.sign sign(x)
tf.square x*x
tf.round round(x)
tf.sqrt sqrt(x)
tf.pow x^y
tf.exp e^x
tf.log log(x)
tf.maximum max(x, y)
tf.minimum min(x, y)
tf.cos cos(x)
tf.sin sin(x)

The TensorBoard enables to monitor graphically and visually what TensorFlow is doing. This can be useful for gaining better understanding of machine learning models. We will look at TensorBoard in separate article

TensorFlow : Getting Started

TensorFlow is an open source machine learning framework for everyone. TensorFlow was developed by the Google Brain team for internal Google use. It was released under the Apache 2.0 open-source license. The reason this framework is critical is because its used by Google in production.

There is lots of fluff in data science and the one company which has actually used data science at a scale is google. Right from google search, to google photos to YouTube videos, Google has done amazing things with data science.

Please check how to install TensorFlow to get it installed. Once its installed, we will take a look at some basic and simplest program to get you started.

Here is most basic example of simple multiplication of two numbers

Here is output

python tensorflow_basics_01.py
*** Program Started ***
Tensor("Mul:0", shape=(), dtype=int32)
2019-01-19 21:23:01.021124: I tensorflow/core/platform/cpu_feature_guard.cc:141]
Your CPU supports instructions that this TensorFlow binary was not compiled to
use: AVX2
14
*** Program Ended ***

Please note that when we printed result its displayed Tensor(“Mul:0”, shape=(), dtype=int32). This is because tensorflow has not yet run. It has generated simply graphs. This is also called as lazy evaluation. We need to create session and then run session to get the output.

Let us have a look at another simple program doing multiplication of matrices

 

Adding and fetching hyperlink in Microsoft Excel

Microsoft Excel is words most widely used data analytics tool. Its most widely used even before widespread usage of term data analytics.

Fetching hyperlink from Hyperlinked cell

Many times we have to deal with hyperlinks in excel. If you have a text from which you want to extract hyperlink, there is no readymade formula available. However, you can do it multiple ways, simplest way is to create your own custom fucntion

  1. Press Alt+F11
  2. Insert->Modul
  3. Add following code
  4. Function GetURL(cell As Range, Optional default_value As Variant)
    'Lists the Hyperlink Address for a Given Cell
    'If cell does not contain a hyperlink, return default_value
        If (cell.Range("A1").Hyperlinks.Count <> 1) Then
          GetURL = default_value
        Else
          GetURL = cell.Range("A1").Hyperlinks(1).Address
        End If
    End Function

     

  5. type =GetURL in any cell and select cell having hyperlink, it will fetch only hyperlink.

How to Add hyperlink to a sell

If you have hyperlink column in your excel and you want to add hyperlink to any text, this can be done using following formula

How to install tensorflow on windows

TensorFlow is very easy to implement.  Let us look how to get started with TensorFlow.

pip install tensorflow
C:\Users\ABCDEFG>pip install tensorflow
Collecting tensorflow
  Downloading https://files.pythonhosted.org/packages/05/cd/c171d2e33c0192b04560
ce864c26eba83fed888fe5cd9ded661b2702f2ae/tensorflow-1.12.0-cp36-cp36m-win_amd64.
whl (45.9MB)
    71% |██████████████████████?         | 32.6MB 119kB/s eta 0:
01:52

 

How to restrict plugin access to multisite

WordPress multi-site is a great tool, it helps in creating network blogs with much easy. However, you need to be careful when allowing user full control. WordPress plugins can be misused by sub sites.

If you want to disable plugins, you can disable is very easily. Login using super admin and go to

Network Admin –> Settings –> Network Settings

Scroll all the way to down and un-check the plugin check box

How to clone gist

Gist is one of the most efficient way to share code snippets, single files and full applications with other people. However one disadvantage of gist is that you can’t share directories, but this is not a major issue considering gist is primarily used to share code snippets.

If you want to make local changes to a gist and push them up to the web, you can clone a gist, make changes and then make commits. It is exactly same process as you would with any Git repository.

Let us look at how to clone gist repository using https

Go to gist repository and get https link. Please find below image to see howto get the https link.

Using following command to clone repository

$ git clone https://gist.github.com/820c117b75d52514b2e58008be07a6eb.git
Cloning into '820c117b75d52514b2e58008be07a6eb'...
remote: Enumerating objects: 44, done.
remote: Total 44 (delta 0), reused 0 (delta 0), pack-reused 44
Unpacking objects: 100% (44/44), done.
Checking connectivity... done.

That is it. You are done. You can cd into the folder and check the files.