## Generating data for Linear Regression using NumPy

We have already seen how to generate random numbers in previous article, here we will have a look at how to generate data in specific format for linear regression.

To test data for linear regression, we will need a data which has somewhat linear relationship and one set of random data.  Please find below code to generate data having some linear relation and random data using Python and NumPy. I have provided graphs which will help you understand the data created by using these programs.

### Data with Linear Trend for Linear Regression

 ################################################################################################ # name: numpy_data_linear.py # desc: Genarate test data having linear relationship # date: 2019-02-02 # Author: conquistadorjd ################################################################################################ import numpy as np from matplotlib import pyplot as plt print('*** Program Started ***') n = 50 x = np.arange(–n/2,n/2,1,dtype=np.float64) m = np.random.uniform(0.3,0.5,(n,)) b = np.random.uniform(5,10,(n,)) y = x*m +b print('x',x, type(x[0])) print('y',y, type(y[0])) plt.scatter(x,y,s=None, marker='o',color='g',edgecolors='g',alpha=0.9,label="Linear Relation") plt.grid(color='black', linestyle='–', linewidth=0.5,markevery=int) plt.legend(loc=2) plt.axis('scaled') plt.show() plt.savefig('numpy_data_linear.jpeg') print('*** Program ended ***')

view raw
numpy_data_linear.py
hosted with ❤ by GitHub

### Data without any Trend for Linear Regression

 ################################################################################################ # name: numpy_data_random_01.py # desc: Genarate test data having linear relationship # date: 2019-02-02 # Author: conquistadorjd ################################################################################################ import numpy as np from matplotlib import pyplot as plt print('*** Program Started ***') n = 50 x = np.arange(–n/2,n/2,1,dtype=np.float64) y = a=np.random.uniform(–15,15,(n,)) print('x',x) print('y',y) plt.scatter(x,y,s=None, marker='o',color='g',edgecolors='g',alpha=0.9,label="Random numbers") plt.grid(color='black', linestyle='–', linewidth=0.5,markevery=int) plt.legend(loc=2) plt.axis('scaled') plt.show() plt.savefig('numpy_data_random_01.jpeg') print('*** Program ended ***')

You can use this as an input data while training your model.

## Generating Random Numbers With NumPy

Many times we need some data for testing or we need some random numbers. NumPy can be very effective in generating random integers, floats or random values between 0 and 1. You can fetch truly random values as well as values in normal distribution as well.

Following program has multiple methods of creating random number for use in program

 import numpy as np ######################### Random values in a given shape. a=np.random.rand(2,3) print(a) # [[ 0.7524278 0.21176809 0.73990734] # [ 0.28341776 0.11559792 0.15859365]] print(type(a)) # print(type(a[0])) # print(type(a[0][0])) # ####################### Return a sample from the standard normal distribution a=np.random.randn(5) print(a) # [ 1.0000366 0.0906066 -0.05027158 -0.14745128 1.35046138] print(type(a)) # print(type(a[0])) # #######################Return a sample from the standard normal distribution a=np.random.randn(5,4) print(a) # [[ 1.48864593 -0.75508993 1.57585151 -0.02507804] # [-1.11795072 0.16357727 0.76753395 0.02291213] # [-1.39439533 0.66704929 -0.01020978 0.12887067] # [-0.19386682 0.70650588 0.71049381 -0.40089744] # [-0.6845585 0.35872981 0.18581329 -0.51889034]] print(type(a)) # print(type(a[0])) # print(type(a[0][0])) # ###############Return random integers from low (inclusive) to high (exclusive). a=np.random.randint(2,14,size=5) print(a) # [12 9 7 3 9] print(type(a)) # print(type(a[0])) # ###############Return random floats in the half-open interval [0.0, 1.0). a=np.random.random_sample(5) print(a) # [-0.20534297 0.4333096 0.94111548 -0.61324519 0.8843922 ] print(type(a)) # print(type(a[0])) # ###############Draw samples from a binomial distribution. n,p=10,0.5 # number of trials, probability of each trial a=np.random.binomial(n,p,100) print(a) # [5 2 5 6 5 3 6 8 5 7 3 4 8 4 9 5 5 6 3 5 7 6 6 2 6 5 6 6 5 3 6 5 6 6 4 6 2 # 7 5 6 7 6 3 3 3 8 8 3 2 5 7 6 4 2 5 7 6 4 5 6 5 5 5 7 4 2 8 3 5 3 6 5 4 4 # 3 3 5 7 7 4 4 6 4 5 6 7 5 5 6 6 4 7 4 4 3 2 6 6 7 3] print(type(a)) # print(type(a[0])) # ############### Draw samples from a uniform distribution. a=np.random.uniform(5,15,(3,)) print(a) # [ 13.81416285 5.82087405 13.24553233] print(type(a)) # print(type(a[0])) #

view raw
numpy_data_random.py
hosted with ❤ by GitHub

## Python NumPy Tutorial : Getting started with NumPy

NumPy is BSD licensed fundamental package for scientific computing with Python.  Most important feature is a powerful N-dimensional array object and sophisticated (broadcasting) functions. It also has useful linear algebra, Fourier transform, and random number capabilities

NumPy can also be used as an efficient multi-dimensional container of generic data. One of the most important feature is Arbitrary data-types.

Importing numpy and creating, accessing and modifying array

>>> import numpy as np
>>> a=np.array([1,2,3,4,5,6])
>>> a
array([1, 2, 3, 4, 5, 6])
>>> type(a)
<class 'numpy.ndarray'>
>>> a[1]
2
>>> a[1]=9
>>> a
array([1, 9, 3, 4, 5, 6])
>>> b = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
>>> b
array([[ 1,  2,  3,  4],
[ 5,  6,  7,  8],
[ 9, 10, 11, 12]])

Numpy Properties

>>> a.shape
(6,)
>>> b.shape
(3, 4)
>>> a.size
6
>>> b.size
12
>>> a.data
<memory at 0x7faf92bb0dc8>
>>> b.data
<memory at 0x7faf92bbba68>
>>> a.dtype
dtype('int64')
>>> b.dtype
dtype('int64')

Mathematical operations on numpy numpyarray

>>> a
array([1, 2, 3, 4, 5, 6])
>>> b
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> c
array([ 3, 6, 9, 12, 15, 18])
>>> a+c
array([ 4, 8, 12, 16, 20, 24])
>>> a-c
array([ -2, -4, -6, -8, -10, -12])
>>> a*c
array([ 3, 12, 27, 48, 75, 108])
>>> a/c
array([0.33333333, 0.33333333, 0.33333333, 0.33333333, 0.33333333,
0.33333333])
>>> np.sqrt(a)
array([1. , 1.41421356, 1.73205081, 2. , 2.23606798,
2.44948974])
>>> np.sum(b,axis=0)
array([15, 18, 21, 24])
>>> np.sum(b,axis=1)
array([10, 26, 42])
>>> b
array([[ 1,  2,  3,  4],
[ 5,  6,  7,  8],
[ 9, 10, 11, 12]])
>>> b.T
array([[ 1,  5,  9],
[ 2,  6, 10],
[ 3,  7, 11],
[ 4,  8, 12]])

Working on the arrays

>>> b
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> b.reshape(4,3)
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
>>> b
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> b.resize(4,3)
>>> b
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
>>> b[0:2,1:2] #[row range,column range]
array([[2],
[5]])