Generating data for Linear Regression using NumPy

We have already seen how to generate random numbers in previous article, here we will have a look at how to generate data in specific format for linear regression.

To test data for linear regression, we will need a data which has somewhat linear relationship and one set of random data.  Please find below code to generate data having some linear relation and random data using Python and NumPy. I have provided graphs which will help you understand the data created by using these programs.

Data with Linear Trend for Linear Regression

Data without any Trend for Linear Regression

You can use this as an input data while training your model.

Generating Random Numbers With NumPy

Many times we need some data for testing or we need some random numbers. NumPy can be very effective in generating random integers, floats or random values between 0 and 1. You can fetch truly random values as well as values in normal distribution as well.

Following program has multiple methods of creating random number for use in program

Python NumPy Tutorial : Getting started with NumPy

NumPy is BSD licensed fundamental package for scientific computing with Python.  Most important feature is a powerful N-dimensional array object and sophisticated (broadcasting) functions. It also has useful linear algebra, Fourier transform, and random number capabilities

NumPy can also be used as an efficient multi-dimensional container of generic data. One of the most important feature is Arbitrary data-types.

Importing numpy and creating, accessing and modifying array

>>> import numpy as np
>>> a=np.array([1,2,3,4,5,6])
>>> a
array([1, 2, 3, 4, 5, 6])
>>> type(a)
<class 'numpy.ndarray'>
>>> a[1]
2
>>> a[1]=9
>>> a
array([1, 9, 3, 4, 5, 6])
>>> b = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
>>> b
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

Numpy Properties

>>> a.shape
(6,)
>>> b.shape
(3, 4)
>>> a.size
6
>>> b.size
12
>>> a.data
<memory at 0x7faf92bb0dc8>
>>> b.data
<memory at 0x7faf92bbba68>
>>> a.dtype
dtype('int64')
>>> b.dtype
dtype('int64')


Mathematical operations on numpy numpyarray

>>> a
array([1, 2, 3, 4, 5, 6])
>>> b
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> c
array([ 3, 6, 9, 12, 15, 18])
>>> a+c
array([ 4, 8, 12, 16, 20, 24])
>>> a-c
array([ -2, -4, -6, -8, -10, -12])
>>> a*c
array([ 3, 12, 27, 48, 75, 108])
>>> a/c
array([0.33333333, 0.33333333, 0.33333333, 0.33333333, 0.33333333,
0.33333333])
>>> np.sqrt(a)
array([1. , 1.41421356, 1.73205081, 2. , 2.23606798,
2.44948974])
>>> np.sum(b,axis=0)
array([15, 18, 21, 24])
>>> np.sum(b,axis=1)
array([10, 26, 42])
>>> b
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])
>>> b.T
array([[ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11],
       [ 4,  8, 12]])

Working on the arrays

>>> b
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> b.reshape(4,3)
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
>>> b
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> b.resize(4,3)
>>> b
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
>>> b[0:2,1:2] #[row range,column range]
array([[2],
       [5]])