Using JSON dataype for database queries – PostgreSQL

I was torn between using relational database and document database like mongodb, elastic search etc for certain requirement. In fact I started learning elasticsearch but getting some of the tasks done with elasticseach is very painful compared to how easy its done in relational database. This was impacting my timeline and I was spending way too much time doing troubleshooting.

This is when I came across json datatype in PostgreSQL. This is combinations of easiness of relational database and document query capabilities of elastic seach. Of course this would be bit slower than elasticseach (I have not benchmarked the performance, buts its guess) but it is OK as I am in prototype phase, I just need to validate my business case, once it flies, I will make a switch be relevant document database.

Meanwhile, lets us see how to work with JSON datatype of PostgreSQL

select clause

Because -> operator returns a JSON object, you can chain it with the operator ->> to retrieve a it in text format.

SELECT id, productid,  productdetails -> 'productBaseInfoV1' ->> 'productUrl' AS customer
FROM public.ecomm_productdetails;
SELECT id, productid,  productdetails -> 'productBaseInfoV1' -> 'productUrl' AS customer
FROM public.ecomm_productdetails;

where clause

Please note you always need to use final element with ->> since this needs to be matched with text field.

SELECT id, productid,  productdetails
FROM public.ecomm_productdetails
WHERE productdetails -> 'productBaseInfoV1' ->> 'title' = 'Apple iPhone 6 (Grey, 128 GB)';

 

SELECT id, productid,  productdetails
FROM public.ecomm_productdetails
WHERE productdetails -> 'productBaseInfoV1' ->> 'title' like 'OPPO%';

How to insert data from pandas to PostgreSQL

When we have done our data analysis using pandas and now need to store this analysis, we can use to_csv option. However if data is too big, it make sense to store this in database.
Let us have a look at two simple methods to store data into PostgreSQL database.

Using sqlalchemy

 

import pandas as pd
from sqlalchemy import create_engine

df =pd.read_csv('input_file.csv',parse_dates=['timestamp'],index_col=0)
#df.info()

engine = create_engine('postgresql://admin:admin@localhost:5432/ibnse')
df.to_sql('stk_1min1', engine,if_exists='append')

Using psycopg2

 

pandas as pd
import psycopg2
conn = psycopg2.connect(database='dbname', user="user", password="password", host="127.0.0.1", port="5432")
cur = conn.cursor()

df =pd.read_csv('input_file.csv',parse_dates=['timestamp'],index_col=0)
for index, row in df.iterrows():
    insertdata =  "('"+str(index)+ "','"+row[0]+"','"+str(row[1])+"','"+str(row[2])+"','"+str(row[3])+"','"+str(row[4])+"','"+str(row[5])+"','"+str(row[6])+"','"+str(row[7])+"','"+str(row[8])+"')"
    print("insertdata :",insertdata)

    try:
        cur.execute("INSERT INTO stk_1min1 values "+insertdata)
        print( "row inserted:", insertdata)
    except psycopg2.IntegrityError:
        print( "Row already exist ")
        pass
    except Exception as e:
        print( "some insert error:", e, "ins: ", insertdata)
    conn.commit()

Few points while using sqlalchemy

  • If table does not exists, it will get created.
  • If table exists and you want to append, you need to use if_exists='append',  Its wise choice to use this option in most of the cases.
  • When you use sqlalchemy, whole pandas dataframe will not be inserted even if you get unique index error for one single record.

Using raw Query to Fetch data from PostgreSQL in Django

Instead of using ORM, if you want to hard code a query in django, you can refer to below code

def getprices(request):
    print("hello backend")
    conn = psycopg2.connect(database="ecommpedia1", user="admin", password="admin", host="127.0.0.1", port="5432")
    cur = conn.cursor()   
    productid = "CAME747PNEUYMSHT"
    cur.execute("SELECT * FROM ecomm_time WHERE  flipkartproductid = '"+productid+"';") 
    mobile_records = cur.fetchall()
    print("mobile_records : ", mobile_records)
    print(type(mobile_records))
    return HttpResponse(mobile_records)