## How to Check if a File Exists using Python

Recently I was working on file generation using python, before generating any file, I had to check if file exists or not, to check if file exists or not, I had to find a way to do in in program. As usual, this task is very easy using python.

Checking if file exists or not can be done in multiple ways using python, here is one using “os” module

os.path module has functions such as isfile, isdir and exists which helps us check if file or directory exists or not.

Here is the output

$python3.6 file_exists_01.py file_exists : True file_exists : False dir_exists : False dir_exists : True dir_exists : False exists : True exists : True If you use isfile() on directory, outcome will be False, you need to use isfile or isdir as per requirement. Alternatively, you can use exists function as well, this returns True if input file or directory path is valid. Please refer to os.path documentation for further details. Here is another way to check if file exists or not using pathlib module. Output is $ python3.6 file_exists_02.py
var : False
var : True
var : True
var : False
var : True
var : True

Both the modules have similar features, you can choose whichever is convenient to you.

## How to read image using Pillow, Python and get image attributes

Pillow is the friendly PIL fork. PIL is the Python Imaging Library. This is the first article in series of image processing articles using python.

Here is simplest program to read image file using pillow and get basic attributes

Output of this program

*** Program Started ***
im object: <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1920x1285 at 0x7F7FB79F6518>
format : JPEG
size : (1920, 1285)
mode : RGB
width : 1920
height : 1285
info : {'jfif': 257, 'jfif_version': (1, 1), 'jfif_unit': 0, 'jfif_density': (1, 1), 'progressive': 1, 'progression': 1}
*** Program Ended ***

## How to check file size in Python

Many times while doing file processing python, we need to know file size in bytes/KBs/MBs. You can get file size using multiple methods, following are two simple methods to get file size using os module.

I have run this file for two inputs,one is image and another is empty file.

Here is the output of the program

$python3.6 file_size.py *** Program Started *** Input file is not empty File size (in Bytes) : 147162 File size (in Bytes) : 147162 Input file is empty File size (in Bytes) : 0 File size (in Bytes) : 0 *** Program Ended *** Actual file size I have added a check to see if file size is empty, you might need to check file size before doing any processing on files. ## Generating data for Linear Regression using NumPy We have already seen how to generate random numbers in previous article, here we will have a look at how to generate data in specific format for linear regression. To test data for linear regression, we will need a data which has somewhat linear relationship and one set of random data. Please find below code to generate data having some linear relation and random data using Python and NumPy. I have provided graphs which will help you understand the data created by using these programs. ### Data with Linear Trend for Linear Regression ### Data without any Trend for Linear Regression You can use this as an input data while training your model. ## Python NumPy Tutorial : Getting started with NumPy NumPy is BSD licensed fundamental package for scientific computing with Python. Most important feature is a powerful N-dimensional array object and sophisticated (broadcasting) functions. It also has useful linear algebra, Fourier transform, and random number capabilities NumPy can also be used as an efficient multi-dimensional container of generic data. One of the most important feature is Arbitrary data-types. Importing numpy and creating, accessing and modifying array >>> import numpy as np >>> a=np.array([1,2,3,4,5,6]) >>> a array([1, 2, 3, 4, 5, 6]) >>> type(a) <class 'numpy.ndarray'> >>> a[1] 2 >>> a[1]=9 >>> a array([1, 9, 3, 4, 5, 6]) >>> b = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]]) >>> b array([[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]]) Numpy Properties >>> a.shape (6,) >>> b.shape (3, 4) >>> a.size 6 >>> b.size 12 >>> a.data <memory at 0x7faf92bb0dc8> >>> b.data <memory at 0x7faf92bbba68> >>> a.dtype dtype('int64') >>> b.dtype dtype('int64')  Mathematical operations on numpy numpyarray >>> a array([1, 2, 3, 4, 5, 6]) >>> b array([[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]]) >>> c array([ 3, 6, 9, 12, 15, 18]) >>> a+c array([ 4, 8, 12, 16, 20, 24]) >>> a-c array([ -2, -4, -6, -8, -10, -12]) >>> a*c array([ 3, 12, 27, 48, 75, 108]) >>> a/c array([0.33333333, 0.33333333, 0.33333333, 0.33333333, 0.33333333, 0.33333333]) >>> np.sqrt(a) array([1. , 1.41421356, 1.73205081, 2. , 2.23606798, 2.44948974]) >>> np.sum(b,axis=0) array([15, 18, 21, 24]) >>> np.sum(b,axis=1) array([10, 26, 42]) >>> b array([[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]]) >>> b.T array([[ 1, 5, 9], [ 2, 6, 10], [ 3, 7, 11], [ 4, 8, 12]])  Working on the arrays >>> b array([[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]]) >>> b.reshape(4,3) array([[ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9], [10, 11, 12]]) >>> b array([[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]]) >>> b.resize(4,3) >>> b array([[ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9], [10, 11, 12]]) >>> b[0:2,1:2] #[row range,column range] array([[2], [5]]) ## Webscrapping using beautifulsoup and python The Python programming language has an ecosystem of modules and tools that can be used for scrapping data from websites. In this article we will be focusing on the Beautiful Soup module. #### Step#1 Install beautifulsoup and other required modules To get started, you need few modules such as requests, lxml to use beautifulsoup. Install required modules as beloww pip install beautifulsoup4 pip install requests pip install lxml #### Step#2 Understand the web page html tags structure Let us try to scrape this wikipedia page https://en.wikipedia.org/wiki/List_of_programming_languages Some observations looking at webpage structure: 1. There is only one h1 element and its page title 2. There are multiple h2 elements 3. Each h2 element has unordered list 4. Some tags have attribute such as id, class etc. #### Step#3 fetch the required data webpage. Before fetching required value, we need to fetch whole webpage. This is achieved by using requests module. from bs4 import BeautifulSoup import requests url = 'https://en.wikipedia.org/wiki/List_of_programming_languages' r = requests.get(url) data = r.text print(data) #### Step#4 parse the webpage Now we have page loaded in html format in variable called as “data”. If you see the output of the page, it will have html tags. Now to access required tag, we need to parse this html. This is where bueatifulsoup comes into picture. soup = BeautifulSoup(data, features = "lxml")  #### Step#3 search the required html tag Variable named as soup has required html tags in a format which can be parsed. Let us assume we want to see header (h1 tag) content. Here is the final code. Output header : <h1 class="firstHeading" id="firstHeading" lang="en">List of programming languages</h1> header Text: List of programming language  Please note that we need to use .text method to get the content of the tag. Here is another example with some additional details ## How to get rid of “No parser was explicitly specified” while using beautifulsoup pythong While using beautifulsoup parsing a page, I got following warning. Although I ignored this warning for sometime, it started to become distracting to see this warning every time I run my program.  UserWarning: No parser was explicitly specified, so I'm u sing the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtua l environment, it may use a different parser and behave differently. The code that caused this warning is on line 12 of the file filename.py. To get rid of this warning, pass the additional argument 'features="lxml"' to th e BeautifulSoup constructor.  There is nothing wrong with this warning and you can continue your coding however I wanted to get it corrected due to following 1. Its distracting to see this error every time I run my program 2. If I run my program on some other machine, it might not perform as expected since system will chose which is available. Besides these two primary error, I get an itching if I see unformated code or uncesessary warinings. Many time I havr burned by fingures while correcting warnings (read , i was able to fix warning but it led to errors and the whole process ate considerable amount of my time.) Don’t worry, fixing above error would not lead to an error. Before fixing this error, install lxml pip install lxml  To fix this warning, simple replace following line soup = BeautifulSoup(data)  with this line soup = BeautifulSoup(data, features = "lxml")  Now run your program and it will run without any warning. ## How to Install correct django version django has multiple version which are supported. There are few differences which can cause confusion while using it. If you want to install specific version of django, follow below steps. ### Step#1 Check what is the current version. $ python3.6 -m django --version
2.0


OR

$python3.6 Python 3.6.4 (default, Jan 13 2018, 12:02:51) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import django >>> django.__version__ '1.11' >>> exit()  If current version is expected version, the you don’t need any of below steps. ### Step#2 Uninstall current version Please note that here you don’t have to put any version while uninstalling. whatever be the current version that will be uninstalled. $ sudo python3.6 -m pip uninstall django
Uninstalling Django-1.11:
Would remove:
/usr/local/lib/python3.6/site-packages/Django-1.11.dist-info/*
/usr/local/lib/python3.6/site-packages/django/*
Proceed (y/n)? y
Successfully uninstalled Django-1.11


### Step#3 Install expected version

$sudo python3.6 -m pip install django Collecting django Downloading https://files.pythonhosted.org/packages/ab/15/cfde97943f0db45e4f999c60b696fbb4df59e82bbccc686770f4e44c9094/Django-2.0.7-py3-none-any.whl (7.1MB) 100% |████████████████████████████████| 7.1MB 103kB/s Requirement already satisfied: pytz in /usr/local/lib/python3.6/site-packages (from django) (2017.3) Installing collected packages: django Successfully installed django-2.0.7  OR use below if you want to install any specific version $ sudo python3.6 -m pip install django==2.0
Collecting django==2.0
100% |████████████████████████████████| 7.1MB 133kB/s
Requirement already satisfied: pytz in /usr/local/lib/python3.6/site-packages (from django==2.0) (2017.3)
Installing collected packages: django
Successfully installed django-2.0

and you are all set. Please let me if you get any issue while installing expected version

## How to add points to timeseries graph to show buy-sell signal – matplotlib

While doing time series analysis, you can show buy/sell signal on the ghraph itself.

Following is the script for doing the same.

While working on this code, I wasted quite a lot of time on below error.

Traceback (most recent call last):
File "timeseries_simple_with_pointer.py", line 35, in <module>
plt.scatter(df.loc[df['SMA20'] >1400.0 , 'Date'],df.loc[df['SMA20'] >1400.0, 'Close Price'], label='skitscat', color='red', s=25, marker="<")
File "/usr/local/lib/python3.6/site-packages/matplotlib/pyplot.py", line 3378, in scatter
edgecolors=edgecolors, data=data, **kwargs)
File "/usr/local/lib/python3.6/site-packages/matplotlib/__init__.py", line 1717, in inner
return func(ax, *args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/matplotlib/axes/_axes.py", line 4023, in scatter
offsets = np.column_stack([x, y])
File "/usr/local/lib/python3.6/site-packages/numpy/lib/shape_base.py", line 369, in column_stack
return _nx.concatenate(arrays, 1)
TypeError: invalid type promotion

On further research I found that matplotlib scatter plot does not support pandas series and the scatter plot output needs to be converted into list so I had to use .values to make it work.

Output of my strategy using SMA 5 and SMAA12

As you can see, in range bound market it creates a lot of whipsaw, however, it was able to capture a very good bull run.

I tried using larger duration SMA to see another strategy. Second time, I tried with SAM20 and SMA100 and here is the output

As expected, longer duration SMA strategy provides lesser number of signals, lesser whipsaw’s.

This result does not mean you need to use higher duration SMA. I am planning to run this logic for approx 200 securities, will update this post with the feedabck later.

## How to create new column in a DataFrame based on values from other columns – pandas

Many times we cant to create an indicator based on other values, this can be achieved very easily in pandas.