How to find all files in a directory with specific extension using Python

Whenever we need to check if file with specific extension exists in a directory or get list of files with specific extension from a directory, it can be done using python. Here is simple program for doing the same.

There are three ways to do this using python

  1. os.walk
  2. os.listdir
  3. glob.glob

All three methods are demonstrated in below program

Output of the program

*** Program Started ***
List of Files using os.walk: ['file3.txt', 'list_to_file_in_directory.txt', 'test.txt']
List of Files using listdir: ['file3.txt', 'list_to_file_in_directory.txt', 'test.txt']
List of Files using glob : ['file3.txt', 'list_to_file_in_directory.txt', 'test.txt']
*** Program Completed ***

 

How to count number of files in directory using Python

Counting number of files using python in specific folder can be done using multiple ways.  Some of these are mentioned below

Here is the output

*** Program Started ***
Number of Files using os.walk : 7
Number of Files using listdir method#1 : 8
Number of Files using listdir method#2 : 7
Number of Files using listdir method#3 : 7
Number of Files using glob : 7
*** Program Completed ***

As you can see Number of Files using listdir method#1 are different, this is because, its counting a folder as a file.

 

How to write a list to a file and read a list from file using Python

Many a times we need to store list in a file for later usage, in such cases, its better to store list in a file and read this file into list whenever we need to use. Python has very easy methods for achieving this.

 

Output of the program

*** Program Started ***
lines : ['one', 'two', 'three', 'four', '']
type of lines : <class 'list'>
line : one
line : two
line : three
line : four
line :
*** Program Completed ***

file generated is as below

Here is another version of same program

output and file generated is same except file opening closing styles are different.

How to Check if a File Exists using Python

Recently I was working on file generation using python, before generating any file, I had to check if file exists or not, to check if file exists or not, I had to find a way to do in in program. As usual, this task is very easy using python.

Checking if file exists or not can be done in multiple ways using python, here is one using “os” module

os.path module has functions such as isfile, isdir and exists which helps us check if file or directory exists or not.

Here is the output

$ python3.6 file_exists_01.py 
file_exists : True
file_exists : False
dir_exists : False
dir_exists : True
dir_exists : False
exists : True
exists : True

If you use isfile() on directory, outcome will be False, you need to use isfile or isdir as per requirement. Alternatively, you can use exists function as well, this returns True if input file or directory path is valid. Please refer to os.path documentation for further details.

Here is another way to check if file exists or not using pathlib module.

Output is

$ python3.6 file_exists_02.py 
var : False
var : True
var : True
var : False
var : True
var : True

Both the modules have similar features, you can choose whichever is convenient to you.

How to read image using Pillow, Python and get image attributes

Pillow is the friendly PIL fork. PIL is the Python Imaging Library. This is the first article in series of image processing articles using python.

In this article we will see how to read file using pillow and get basic attributes.

Here is simplest program to read image file using pillow and get basic attributes

Output of this program

*** Program Started ***
im object: <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1920x1285 at 0x7F7FB79F6518>
format : JPEG
size : (1920, 1285)
mode : RGB
filename : /home/conquistador/code/github/python-01-utilities/image/input/01_read_image.jpg
width : 1920
height : 1285
info : {'jfif': 257, 'jfif_version': (1, 1), 'jfif_unit': 0, 'jfif_density': (1, 1), 'progressive': 1, 'progression': 1}
*** Program Ended ***

 

How to check file size in Python

Many times while doing file processing python, we need to know file size in bytes/KBs/MBs. You can get file size using multiple methods, following are two simple methods to get file size using os module.

I have run this file for two inputs,one is image and another is empty file.

Here is the output of the program

$ python3.6 file_size.py 
*** Program Started ***
Input file is not empty
File size (in Bytes) : 147162
File size (in Bytes) : 147162
Input file is empty
File size (in Bytes) : 0
File size (in Bytes) : 0
*** Program Ended ***

Actual file size

I have added a check to see if file size is empty, you might need to check file size before doing any processing on files.

Generating data for Linear Regression using NumPy

We have already seen how to generate random numbers in previous article, here we will have a look at how to generate data in specific format for linear regression.

To test data for linear regression, we will need a data which has somewhat linear relationship and one set of random data.  Please find below code to generate data having some linear relation and random data using Python and NumPy. I have provided graphs which will help you understand the data created by using these programs.

Data with Linear Trend for Linear Regression

Data without any Trend for Linear Regression

You can use this as an input data while training your model.

Python NumPy Tutorial : Getting started with NumPy

NumPy is BSD licensed fundamental package for scientific computing with Python.  Most important feature is a powerful N-dimensional array object and sophisticated (broadcasting) functions. It also has useful linear algebra, Fourier transform, and random number capabilities

NumPy can also be used as an efficient multi-dimensional container of generic data. One of the most important feature is Arbitrary data-types.

Importing numpy and creating, accessing and modifying array

>>> import numpy as np
>>> a=np.array([1,2,3,4,5,6])
>>> a
array([1, 2, 3, 4, 5, 6])
>>> type(a)
<class 'numpy.ndarray'>
>>> a[1]
2
>>> a[1]=9
>>> a
array([1, 9, 3, 4, 5, 6])
>>> b = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
>>> b
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

Numpy Properties

>>> a.shape
(6,)
>>> b.shape
(3, 4)
>>> a.size
6
>>> b.size
12
>>> a.data
<memory at 0x7faf92bb0dc8>
>>> b.data
<memory at 0x7faf92bbba68>
>>> a.dtype
dtype('int64')
>>> b.dtype
dtype('int64')


Mathematical operations on numpy numpyarray

>>> a
array([1, 2, 3, 4, 5, 6])
>>> b
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> c
array([ 3, 6, 9, 12, 15, 18])
>>> a+c
array([ 4, 8, 12, 16, 20, 24])
>>> a-c
array([ -2, -4, -6, -8, -10, -12])
>>> a*c
array([ 3, 12, 27, 48, 75, 108])
>>> a/c
array([0.33333333, 0.33333333, 0.33333333, 0.33333333, 0.33333333,
0.33333333])
>>> np.sqrt(a)
array([1. , 1.41421356, 1.73205081, 2. , 2.23606798,
2.44948974])
>>> np.sum(b,axis=0)
array([15, 18, 21, 24])
>>> np.sum(b,axis=1)
array([10, 26, 42])
>>> b
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])
>>> b.T
array([[ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11],
       [ 4,  8, 12]])

Working on the arrays

>>> b
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> b.reshape(4,3)
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
>>> b
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> b.resize(4,3)
>>> b
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
>>> b[0:2,1:2] #[row range,column range]
array([[2],
       [5]])

Webscrapping using beautifulsoup and python

The Python programming language has an ecosystem of modules and tools that can be used for scrapping data from websites. In this article we will be focusing on the Beautiful Soup module.

Step#1 Install beautifulsoup and other required modules

To get started, you need few modules such as requests, lxml to use beautifulsoup. Install required modules as beloww

pip install beautifulsoup4
pip install requests
pip install lxml

Step#2 Understand the web page html tags structure

Let us try to scrape this wikipedia page https://en.wikipedia.org/wiki/List_of_programming_languages

Some observations looking at webpage structure:

  1. There is only one h1 element and its page title
  2. There are multiple h2 elements
  3. Each h2 element has unordered list
  4. Some tags have attribute such as id, class etc.

Step#3 fetch the required data webpage.

Before fetching required value, we need to fetch whole webpage. This is achieved by using requests module.

from bs4 import BeautifulSoup
import requests

url = 'https://en.wikipedia.org/wiki/List_of_programming_languages'
r  = requests.get(url)
data = r.text
print(data)

Step#4 parse the webpage

Now we have page loaded in html format in variable called as “data”. If you see the output of the page, it will have html tags. Now to access required tag, we need to parse this html. This is where bueatifulsoup comes into picture.

soup = BeautifulSoup(data, features = "lxml")

Step#3 search the required html tag

Variable named as soup has required html tags in a format which can be parsed.

Let us assume we want to see header (h1 tag) content. Here is the final code.

Output

header :  <h1 class="firstHeading" id="firstHeading" lang="en">List of programming languages</h1>
header Text:  List of programming language

Please note that we need to use .text method to get the content of the tag.

Here is another example with some additional details

How to get rid of “No parser was explicitly specified” while using beautifulsoup pythong

While using beautifulsoup parsing a page, I got following warning. Although I ignored this warning for sometime, it started to become distracting to see this warning every time I run my program.


UserWarning: No parser was explicitly specified, so I'm u
sing the best available HTML parser for this system ("lxml"). This usually isn't
 a problem, but if you run this code on another system, or in a different virtua
l environment, it may use a different parser and behave differently.

The code that caused this warning is on line 12 of the file filename.py.
To get rid of this warning, pass the additional argument 'features="lxml"' to th
e BeautifulSoup constructor.

There is nothing wrong with this warning and you can continue your coding however I wanted to get it corrected due to following

  1. Its distracting to see this error every time I run my program
  2. If I run my program on some other machine, it might not perform as expected since system will chose which is available.

Besides these two primary error, I get an itching if I see unformated code or uncesessary warinings. Many time I havr burned by fingures while correcting warnings (read , i was able to fix warning but it led to errors and the whole process ate considerable amount of my time.)

Don’t worry, fixing above error would not lead to an error.

Before fixing this error, install lxml

pip install lxml

To fix this warning, simple replace following line

soup = BeautifulSoup(data)

with this line

soup = BeautifulSoup(data, features = "lxml")

Now run your program and it will run without any warning.