How to update or insert field in Elasticsearch using Python

Update or insert is done by API. Most useful fact is that its upsert meaning, we use the same API. If field exists, it will get updated and if it does not exist, it will be inserted.
My first record is as below

import requests
import json

uri='http://localhost:9200/book/_doc/1'
headers1 ={'Content-Type': 'application/json'}
response = requests.get(uri,headers=headers1)
print("*************************************")
print("Accessing document before update : ", response.text)

Terminal output

Accessing document before update :  {
      "isbn": "9781593275846",
      "title": "Eloquent JavaScript, Second Edition",
      "subtitle": "A Modern Introduction to Programming",
      "author": "Marijn Haverbeke",
      "published": "2014-12-14T00:00:00.000Z",
      "publisher": "No Starch Press",
      "pages": 472,
      "description": "JavaScript lies at the heart of almost every modern web application, from social apps to the newest browser-based games. Though simple for beginners to pick up and play with, JavaScript is a flexible, complex language that you can use to build full-scale applications.",
      "website": "http://eloquentjavascript.net/"
    }

Now let us look at the code to update and insert the field in document.  Please note that there are multiple ways of doing update / insert.

import requests
import json
####################################################################### Insert one field in document
uri='http://localhost:9200/book/_update/1'
headers1 ={'Content-Type': 'application/json'}
query = json.dumps(
    {
    "script" : "ctx._source.price = 365.65"
    }
)
response = requests.post(uri,headers=headers1,data=query)
print("*************************************")
print(" Output", response.text)

####################################################################### Insert two fields in document
uri='http://localhost:9200/book/_update/1'
headers1 ={'Content-Type': 'application/json'}
query = json.dumps(
{
  "doc": {
    "summary":"summary3",
    "coauth":"co author name"
  }
}
)
response = requests.post(uri,headers=headers1,data=query)
print("*************************************")
print(" Output", response.text)

####################################################################### Insert nested field in document
uri='http://localhost:9200/book/_update/1'
headers1 ={'Content-Type': 'application/json'}
query = json.dumps(
    {
    "doc": {
        "country":{
        "continent":"Asia",
        "code" : 91
        },
        "coauthor_two":"Another co-author"
    }
    }
)
response = requests.post(uri,headers=headers1,data=query)
print("*************************************")
print(" Output", response.text)

####################################################################### Update field
uri='http://localhost:9200/book/_update/1'
headers1 ={'Content-Type': 'application/json'}
query = json.dumps(
{
  "doc": {
    "title":"Eloquent JavaScript, Second Edition - Updated using API from Python Script"
  }
}
)
response = requests.post(uri,headers=headers1,data=query)
print("*************************************")
print(" Output", response.text)

Once we have run above script, let us look at the document again by running below program or checking kibana.

import requests
import json
uri='http://localhost:9200/book/_source/1'
headers1 ={'Content-Type': 'application/json'}
response = requests.get(uri,headers=headers1)
print("*************************************")
print("Accessing document After update : ", response.text)

Output

Accessing document After update : {
    "isbn" : "9781593275846",
    "title" : "Eloquent JavaScript, Second Edition - Updated using API from Python Script",
    "subtitle" : "A Modern Introduction to Programming",
    "author" : "Marijn Haverbeke",
    "published" : "2014-12-14T00:00:00.000Z",
    "publisher" : "No Starch Press",
    "pages" : 472,
    "description" : "JavaScript lies at the heart of almost every modern web application, from social apps to the newest browser-based games. Though simple for beginners to pick up and play with, JavaScript is a flexible, complex language that you can use to build full-scale applications.",
    "website" : "http://eloquentjavascript.net/",
    "price" : 365.65,
    "summary" : "summary3",
    "coauth" : "co author name",
    "coauthor_two" : "Another co-author",
    "country" : {
      "continent" : "Asia",
      "code" : 91
    }

Kibana output

Elasticsearch API Search document queries

Elasticsearch has very detailed search API but its bit different for someone with RDBMS and SQL query background. Here are some of the sample queries.

search all

GET /ecomm/_search
{
    "query": {
        "match_all": {}
    }
}

Search for specific key

GET /twitter/_search
{
    "query": {
        "match": { "user":"ssss"}
    }
}

Search using URL

GET /ecomm/_search?q=Apple

Detailed search for nested value

GET /ecomm/_search
{
    "query": {
        "match" : {
            "productBaseInfoV1.productId": "MOBFKCTSYAPWYFJ5"
        }
    }
}

Search with more parameters

POST /ecomm/_search
{
    "query": {
        "match" : {
            "productBaseInfoV1.productId": "MOBFKCTSYAPWYFJ5"
        }
    },
    "size": 1,
    "from": 0,
    "_source": [ "productBaseInfoV1.productId", "productBaseInfoV1.title", "imageUrls","productDescription" ]
}

Accessing Elasticsearch API from Python Script

Elasticsearch provides easy to use API and it can be access from kibana, postman, browser and curl.  You can read here how to access elasticsearch API from these options.

In this post we will look at very simple example of accessing elasticsearch API from python. Here is simple example along with results

import requests
import json

uri='http://localhost:9200/_cat/indices'
headers1 ={'Content-Type': 'application/json'}
response = requests.get(uri,headers=headers1)
print("this is : ", response.text)

uri='http://localhost:9200/twitter/_doc/7'
headers1 ={'Content-Type': 'application/json'}
response = requests.get(uri,headers=headers1)
print("*************************************")
print("Accessing document before creating : ", response.text)

uri='http://localhost:9200/twitter/_doc/7'
headers1 ={'Content-Type': 'application/json'}
query = json.dumps({
    "query": {
            "user" : "NewUser",
            "post_date" : "2009-11-15T14:12:12",
            "message" : "trying out Elasticsearch from python"
            }
        })
response = requests.put(uri,headers=headers1,data=query)
print("*************************************")
print("Document is created ", response.text)

uri='http://localhost:9200/twitter/_doc/7'
headers1 ={'Content-Type': 'application/json'}
response = requests.get(uri,headers=headers1)
print("*************************************")
print("Accessing newly created document: ", response.text)

Here is output


$ python3 second.py 
/usr/lib/python3/dist-packages/requests/__init__.py:80: RequestsDependencyWarning: urllib3 (1.25.7) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)
this is :  yellow open ecommpediatest               R6WypBc2RtCS_ITA40_rFw 1 1    0 0    283b    283b
yellow open test-index                   OeILxjmFRhODhztu4GgE_w 1 1    1 0   4.5kb   4.5kb
yellow open twitter                      BUmjGbLpTNCbnDIPxht9vA 1 1   13 4  24.8kb  24.8kb
yellow open bookindex                    RKq8oJKvRb2HQHxuPZdEbw 1 1    1 0   5.4kb   5.4kb
green  open .kibana_task_manager_1       pc_LXegKQWu9690vT1Z-pA 1 0    2 1  16.9kb  16.9kb
green  open kibana_sample_data_ecommerce FkD1obNSSK6mfLDsy9ILPQ 1 0 4675 0   4.9mb   4.9mb
yellow open facebook                     4Wzax6UhThm5rXi03PiG7w 1 1    2 0     7kb     7kb
green  open .apm-agent-configuration     sxoXMmsoS4mRPyjcaPByzw 1 0    0 0    283b    283b
green  open .kibana_1                    pK7pO-SCSBORonZLDi8Vew 1 0   60 2 945.9kb 945.9kb
yellow open twitter1                     3iMzcYoJR3qY9JiM2sY8_g 1 1    1 0   4.5kb   4.5kb

*************************************
Accessing document before creating :  {"_index":"twitter","_type":"_doc","_id":"7","found":false}
*************************************
Document is created  {"_index":"twitter","_type":"_doc","_id":"7","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":28,"_primary_term":2}
*************************************
Accessing newly created document:  {"_index":"twitter","_type":"_doc","_id":"7","_version":1,"_seq_no":28,"_primary_term":2,"found":true,"_source":{"query": {"user": "NewUser", "post_date": "2009-11-15T14:12:12", "message": "trying out Elasticsearch from python"}}}

We will have a look at complex queries from python at later stages.

How to use Elasticsearch API

Elasticsearch exposes REST APIs that are used by the UI components and can be called directly to configure and access Elasticsearch features.

Show current indices

http://localhost:9200/_cat/indices

Output

yellow open test-index                   OeILxjmFRhODhztu4GgE_w 1 1    1 0   4.5kb   4.5kb
yellow open twitter                      BUmjGbLpTNCbnDIPxht9vA 1 1    1 0   3.7kb   3.7kb
yellow open bookindex                    RKq8oJKvRb2HQHxuPZdEbw 1 1    1 0   5.4kb   5.4kb
green  open kibana_sample_data_ecommerce FkD1obNSSK6mfLDsy9ILPQ 1 0 4675 0   4.9mb   4.9mb
green  open .kibana_task_manager_1       pc_LXegKQWu9690vT1Z-pA 1 0    2 1  16.9kb  16.9kb
yellow open facebook                     4Wzax6UhThm5rXi03PiG7w 1 1    1 0   3.5kb   3.5kb
green  open .apm-agent-configuration     sxoXMmsoS4mRPyjcaPByzw 1 0    0 0    283b    283b
green  open .kibana_1                    pK7pO-SCSBORonZLDi8Vew 1 0   60 2 949.5kb 949.5kb

This and below mentioned commands can be run from kibana, postman, browser or cURL

Curl

$ curl "localhost:9200/_cat/indices"
yellow open ecommpediatest               R6WypBc2RtCS_ITA40_rFw 1 1    0 0    283b    283b
yellow open test-index                   OeILxjmFRhODhztu4GgE_w 1 1    1 0   4.5kb   4.5kb
yellow open twitter                      BUmjGbLpTNCbnDIPxht9vA 1 1   11 5  26.9kb  26.9kb
yellow open bookindex                    RKq8oJKvRb2HQHxuPZdEbw 1 1    1 0   5.4kb   5.4kb
green  open .kibana_task_manager_1       pc_LXegKQWu9690vT1Z-pA 1 0    2 1  16.9kb  16.9kb
green  open kibana_sample_data_ecommerce FkD1obNSSK6mfLDsy9ILPQ 1 0 4675 0   4.9mb   4.9mb
green  open .apm-agent-configuration     sxoXMmsoS4mRPyjcaPByzw 1 0    0 0    283b    283b
yellow open facebook                     4Wzax6UhThm5rXi03PiG7w 1 1    2 0     7kb     7kb
green  open .kibana_1                    pK7pO-SCSBORonZLDi8Vew 1 0   60 2 945.9kb 945.9kb
yellow open twitter1                     3iMzcYoJR3qY9JiM2sY8_g 1 1    1 0   4.5kb   4.5kb

Kibana

 

Postman

 

Browser

How to create new index

PUT /ecommpediatest

same can e achieved by create new document command

PUT /twitter1/_doc/1
{
"user" : "kimchy updated",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}

Creating or updating record

------------------- Get records
GET twitter/_doc/1
HEAD twitter/_doc/1
GET twitter/_source/10
HEAD twitter/_source/1
-------------------update API
PUT /twitter/_doc/1
{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}

POST /twitter/_doc/2
{
"user" : "tom",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
PUT /twitter/_create/2
{
"user" : "Jerry",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
POST /twitter/_create/
{
"user" : "cartoon",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
-------------------delete
DELETE /twitter/_doc/1
-------------------- create
PUT /twitter/_doc/1
{
"user" : "one",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}

POST /twitter/_doc/
{
"user" : "one",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}

PUT /twitter/_create/aa
{
"user" : "xxx",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
POST /twitter/_create/ss
{
"user" : "ssss",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
------------------------------------------------------search
GET /_search
{
"query": {
"ids" : {
"values" : ["1", "ss", "100"]
}
}
}
GET /twitter/_search
{
"query": {
"ids" : {
"values" : [11,"ss"]
}
}
}
GET /twitter/_search
{
"query": {
"match_all": {}
}
}

GET /twitter/_search
{
"query": {
"match": { "user":"ssss"}
}
}

 

 

 

How to Install and Configure Kibana on Ubuntu

A picture is worth a thousand log lines. Kibana gives you the freedom to select the way you give shape to your data. Kibana lets you visualize your Elasticsearch data and navigate the Elastic Stack so you can do anything from tracking query load to understanding the way requests flow through your apps.
Install Kibana<
Packages is signed with the Elasticsearch Signing Key. Download and install the public signing key:

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

You may need to install the apt-transport-https package on Ubuntu before proceeding

$sudo apt-get install apt-transport-https
Reading package lists... Done
Building dependency tree
Reading state information... Done
apt-transport-https is already the newest version (1.6.12).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

Save the repository definition to /etc/apt/sources.list.d/elastic-7.x.list:

echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list

You can install the Kibana Debian package with:

sudo apt-get update
sudo apt-get install kibana

You can start and stop Kibana service by following commands

sudo systemctl start kibana.service
sudo systemctl stop kibana.service

Once you have started kibana service, let us check status

$ sudo systemctl status kibana.service
● kibana.service - Kibana
Loaded: loaded (/etc/systemd/system/kibana.service; disabled; vendor preset: enabled)
Active: active (running) since Tue 2020-01-07 21:25:22 IST; 9s ago
Main PID: 6226 (node)
Tasks: 11 (limit: 4915)
CGroup: /system.slice/kibana.service
└─6226 /usr/share/kibana/bin/../node/bin/node /usr/share/kibana/bin/../src/cli -c /etc/kibana/kibana.yml

Let us try to access kibana server by accessing  at http://localhost:5601

Now, we need to make following changes

server.port: 5601
server.host: "localhost"
elasticsearch.hosts: ["http://localhost:9200"]

Now let us try to access kibana again at http://localhost:5601

Here is sample Dashboard, this dashboard is a

a

How to Install and Configure Elasticsearch on Ubuntu

Elasticsearch is a distributed, open source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. Elasticsearch is built on Apache Lucene and was first released in 2010 by Elasticsearch N.V. . Elasticsearch is the central component of the Elastic Stack, a set of open source tools for data ingestion, enrichment, storage, analysis, and visualization. Commonly referred to as the ELK Stack (after Elasticsearch, Logstash, and Kibana).

Elasticsearch is known for its simple REST APIs, distributed nature, speed, and scalability. You can use http methods like get , post, put, delete in combination with an url to manipulate your data.

Prerequisites

Elasticsearch is developed in java and you need to have java installed on your Ubuntu. Use following command to check if java is installed and its version.

$ java --version
openjdk 11.0.5 2019-10-15
OpenJDK Runtime Environment (build 11.0.5+10-post-Ubuntu-0ubuntu1.118.04)
OpenJDK 64-Bit Server VM (build 11.0.5+10-post-Ubuntu-0ubuntu1.118.04, mixed mode, sharing)

Now check if  $JAVA_HOME variable is configured by checking its content by following command. If it returns blank, you need to set it up. Check post How to set Java environment path $JAVA_HOME in Ubuntu for setting up $JAVA_HOME

$ echo $JAVA_HOME
/usr/java/java-1.11.0-openjdk-amd6

Setting up Elasticsearch

Packages is signed with the Elasticsearch Signing Key. Download and install the public signing key:

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

You may need to install the apt-transport-https package on Ubuntu before proceeding

$sudo apt-get install apt-transport-https
Reading package lists... Done
Building dependency tree 
Reading state information... Done
apt-transport-https is already the newest version (1.6.12).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded./code>

Now Save the repository definition to /etc/apt/sources.list.d/elastic-7.x.list:

$ echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
deb https://artifacts.elastic.co/packages/7.x/apt stable main

You can install the Elasticsearch Debian package with:

sudo apt-get update 
sudo apt-get install elasticsearch

Configure Elasticsearch

Elasticsearch files are located in directory /etc/elasticsearch. There are two files

  1. elasticsearch.yml configures the Elasticsearch server settings.
  2. logging.yml it provides configuration for logging

Let us update elasticsearch.yml

sudo nano /etc/elasticsearch/elasticsearch.yml

There are multiple variables, however you need to update following variables

  • cluster.name : specifies the name of the cluster
  • node.name: specifies the name of the server (node) . Please note the double inverted commas.

Please do not change network.host. You might see that in some posts people have changed network.host 0.0.0.0 or 127.0.0.1. If you change network.host, The cluster coordination algorithm has changed in 7.0 610 and in order to be safe it requires some specific configuration. This is relaxed when you bind to localhost only, but if/when you change network.host elasticsearch enforces that your configure the cluster safely.

# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: mycluster1
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: "node-1"
#
# Add custom attributes to the node:
#
#node.attr.rack: r1


...

# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#

Launch Elasticsearch

You can start and stop Elasticsearch using following command

sudo systemctl start elasticsearch.service
sudo systemctl stop elasticsearch.service

Testing Elasticsearch

You can check connectivity using curl. Install curl if you dont have it on your machine and run following command

$ curl -X GET "http://localhost:9200/?pretty"
{
  "name" : "node-1",
  "cluster_name" : "mycluster1",
  "cluster_uuid" : "Q6GqVVJfRZO6KSHQ-pFbcQ",
  "version" : {
    "number" : "7.5.1",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "3ae9ac9a93c95bd0cdc054951cf95d88e1e18d96",
    "build_date" : "2019-12-16T22:57:37.835892Z",
    "build_snapshot" : false,
    "lucene_version" : "8.3.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

How to install tensorflow on windows

TensorFlow is very easy to implement.  Let us look how to get started with TensorFlow.

pip install tensorflow
C:\Users\ABCDEFG>pip install tensorflow
Collecting tensorflow
  Downloading https://files.pythonhosted.org/packages/05/cd/c171d2e33c0192b04560
ce864c26eba83fed888fe5cd9ded661b2702f2ae/tensorflow-1.12.0-cp36-cp36m-win_amd64.
whl (45.9MB)
    71% |██████████████████████?         | 32.6MB 119kB/s eta 0:
01:52

 

Designing My Own “Data Science & Machine Learning” Program

I have already completed my post graduation in business administration with specialization in finance, it had some exposure to analytics as well. Three years after completion of this, I got an itching to learn something about data science. Having said that, I cant imagining going for another PG but this should not stop me from learning it.

In the age of MOOC, online learning, YouTube, blogs, most of the resources are available to everyone, Yes, it is a pain to identify good resources but its there. I decided to go through some of the reputed Data Science program curriculum and derive a program for Data Science from freely available resources. I have considered following courses.

  • Master of Science in Business Analytics by UT at Austin
  • MS in Data Science by NYU
  • Master of Science in Data Science by Columbia University
  • MS in Computational Data Science by Carnegie Mellon University
  • Certificate Program in Business Analytics by ISB

I also looked at some already created road maps.

  • http://datasciencemasters.org/
  • http://www.datasciguide.com/

These road-maps are huge, in fact authors have mentioned that its not humanly possible to go through each resource, whoever is following these, should follow selectively.

Machine learning topics list is huge, if you start learning everything, you will loose momentum so when you start learning machine learning, consider following points

Firstly and most importantly choose your niche and start working on it,if you have not selected any niche, start reading more about different subtopics but dont delay deciding your niche.

and secondly, don’t wait till you complete all learning, start getting your hands dirty as early as possible.

Data Science in Python by University of Michigan

Month#1

Python Basics, NumPy, SciPy, Pandas and Matplotlib using following two courses:

Introduction to Data Science in Python

Basic Statistics

Applied Plotting, Charting & Data Representation in Python

You can do applied plotting, charting & data representation course at later stage but I recommend you do python and basic statistics courses before you starting machine learning course.

Month#2

Applied Machine Learning in Python

I am still analysing next steps and I will add these section once I figure these out.  Since now we have basic programming as well as basic understanding of machine learning this is a right time to learn required mathematics, statistics , probability before we venture ahead.

Please view this video as well

 

and

 

Important blogs and Websites for Data Science

Learning  and practicing Data Science

Importance Courses

Free Books (All legit)

Data Science Blogs to Follow