How to update or insert field in Elasticsearch using Python

Update or insert is done by API. Most useful fact is that its upsert meaning, we use the same API. If field exists, it will get updated and if it does not exist, it will be inserted.
My first record is as below

import requests
import json

uri='http://localhost:9200/book/_doc/1'
headers1 ={'Content-Type': 'application/json'}
response = requests.get(uri,headers=headers1)
print("*************************************")
print("Accessing document before update : ", response.text)

Terminal output

Accessing document before update :  {
      "isbn": "9781593275846",
      "title": "Eloquent JavaScript, Second Edition",
      "subtitle": "A Modern Introduction to Programming",
      "author": "Marijn Haverbeke",
      "published": "2014-12-14T00:00:00.000Z",
      "publisher": "No Starch Press",
      "pages": 472,
      "description": "JavaScript lies at the heart of almost every modern web application, from social apps to the newest browser-based games. Though simple for beginners to pick up and play with, JavaScript is a flexible, complex language that you can use to build full-scale applications.",
      "website": "http://eloquentjavascript.net/"
    }

Now let us look at the code to update and insert the field in document.  Please note that there are multiple ways of doing update / insert.

import requests
import json
####################################################################### Insert one field in document
uri='http://localhost:9200/book/_update/1'
headers1 ={'Content-Type': 'application/json'}
query = json.dumps(
    {
    "script" : "ctx._source.price = 365.65"
    }
)
response = requests.post(uri,headers=headers1,data=query)
print("*************************************")
print(" Output", response.text)

####################################################################### Insert two fields in document
uri='http://localhost:9200/book/_update/1'
headers1 ={'Content-Type': 'application/json'}
query = json.dumps(
{
  "doc": {
    "summary":"summary3",
    "coauth":"co author name"
  }
}
)
response = requests.post(uri,headers=headers1,data=query)
print("*************************************")
print(" Output", response.text)

####################################################################### Insert nested field in document
uri='http://localhost:9200/book/_update/1'
headers1 ={'Content-Type': 'application/json'}
query = json.dumps(
    {
    "doc": {
        "country":{
        "continent":"Asia",
        "code" : 91
        },
        "coauthor_two":"Another co-author"
    }
    }
)
response = requests.post(uri,headers=headers1,data=query)
print("*************************************")
print(" Output", response.text)

####################################################################### Update field
uri='http://localhost:9200/book/_update/1'
headers1 ={'Content-Type': 'application/json'}
query = json.dumps(
{
  "doc": {
    "title":"Eloquent JavaScript, Second Edition - Updated using API from Python Script"
  }
}
)
response = requests.post(uri,headers=headers1,data=query)
print("*************************************")
print(" Output", response.text)

Once we have run above script, let us look at the document again by running below program or checking kibana.

import requests
import json
uri='http://localhost:9200/book/_source/1'
headers1 ={'Content-Type': 'application/json'}
response = requests.get(uri,headers=headers1)
print("*************************************")
print("Accessing document After update : ", response.text)

Output

Accessing document After update : {
    "isbn" : "9781593275846",
    "title" : "Eloquent JavaScript, Second Edition - Updated using API from Python Script",
    "subtitle" : "A Modern Introduction to Programming",
    "author" : "Marijn Haverbeke",
    "published" : "2014-12-14T00:00:00.000Z",
    "publisher" : "No Starch Press",
    "pages" : 472,
    "description" : "JavaScript lies at the heart of almost every modern web application, from social apps to the newest browser-based games. Though simple for beginners to pick up and play with, JavaScript is a flexible, complex language that you can use to build full-scale applications.",
    "website" : "http://eloquentjavascript.net/",
    "price" : 365.65,
    "summary" : "summary3",
    "coauth" : "co author name",
    "coauthor_two" : "Another co-author",
    "country" : {
      "continent" : "Asia",
      "code" : 91
    }

Kibana output

Elasticsearch API Search document queries

Elasticsearch has very detailed search API but its bit different for someone with RDBMS and SQL query background. Here are some of the sample queries.

search all

GET /ecomm/_search
{
    "query": {
        "match_all": {}
    }
}

Search for specific key

GET /twitter/_search
{
    "query": {
        "match": { "user":"ssss"}
    }
}

Search using URL

GET /ecomm/_search?q=Apple

Detailed search for nested value

GET /ecomm/_search
{
    "query": {
        "match" : {
            "productBaseInfoV1.productId": "MOBFKCTSYAPWYFJ5"
        }
    }
}

Search with more parameters

POST /ecomm/_search
{
    "query": {
        "match" : {
            "productBaseInfoV1.productId": "MOBFKCTSYAPWYFJ5"
        }
    },
    "size": 1,
    "from": 0,
    "_source": [ "productBaseInfoV1.productId", "productBaseInfoV1.title", "imageUrls","productDescription" ]
}

Accessing Elasticsearch API from Python Script

Elasticsearch provides easy to use API and it can be access from kibana, postman, browser and curl.  You can read here how to access elasticsearch API from these options.

In this post we will look at very simple example of accessing elasticsearch API from python. Here is simple example along with results

import requests
import json

uri='http://localhost:9200/_cat/indices'
headers1 ={'Content-Type': 'application/json'}
response = requests.get(uri,headers=headers1)
print("this is : ", response.text)

uri='http://localhost:9200/twitter/_doc/7'
headers1 ={'Content-Type': 'application/json'}
response = requests.get(uri,headers=headers1)
print("*************************************")
print("Accessing document before creating : ", response.text)

uri='http://localhost:9200/twitter/_doc/7'
headers1 ={'Content-Type': 'application/json'}
query = json.dumps({
    "query": {
            "user" : "NewUser",
            "post_date" : "2009-11-15T14:12:12",
            "message" : "trying out Elasticsearch from python"
            }
        })
response = requests.put(uri,headers=headers1,data=query)
print("*************************************")
print("Document is created ", response.text)

uri='http://localhost:9200/twitter/_doc/7'
headers1 ={'Content-Type': 'application/json'}
response = requests.get(uri,headers=headers1)
print("*************************************")
print("Accessing newly created document: ", response.text)

Here is output


$ python3 second.py 
/usr/lib/python3/dist-packages/requests/__init__.py:80: RequestsDependencyWarning: urllib3 (1.25.7) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)
this is :  yellow open ecommpediatest               R6WypBc2RtCS_ITA40_rFw 1 1    0 0    283b    283b
yellow open test-index                   OeILxjmFRhODhztu4GgE_w 1 1    1 0   4.5kb   4.5kb
yellow open twitter                      BUmjGbLpTNCbnDIPxht9vA 1 1   13 4  24.8kb  24.8kb
yellow open bookindex                    RKq8oJKvRb2HQHxuPZdEbw 1 1    1 0   5.4kb   5.4kb
green  open .kibana_task_manager_1       pc_LXegKQWu9690vT1Z-pA 1 0    2 1  16.9kb  16.9kb
green  open kibana_sample_data_ecommerce FkD1obNSSK6mfLDsy9ILPQ 1 0 4675 0   4.9mb   4.9mb
yellow open facebook                     4Wzax6UhThm5rXi03PiG7w 1 1    2 0     7kb     7kb
green  open .apm-agent-configuration     sxoXMmsoS4mRPyjcaPByzw 1 0    0 0    283b    283b
green  open .kibana_1                    pK7pO-SCSBORonZLDi8Vew 1 0   60 2 945.9kb 945.9kb
yellow open twitter1                     3iMzcYoJR3qY9JiM2sY8_g 1 1    1 0   4.5kb   4.5kb

*************************************
Accessing document before creating :  {"_index":"twitter","_type":"_doc","_id":"7","found":false}
*************************************
Document is created  {"_index":"twitter","_type":"_doc","_id":"7","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":28,"_primary_term":2}
*************************************
Accessing newly created document:  {"_index":"twitter","_type":"_doc","_id":"7","_version":1,"_seq_no":28,"_primary_term":2,"found":true,"_source":{"query": {"user": "NewUser", "post_date": "2009-11-15T14:12:12", "message": "trying out Elasticsearch from python"}}}

We will have a look at complex queries from python at later stages.

How to use Elasticsearch API

Elasticsearch exposes REST APIs that are used by the UI components and can be called directly to configure and access Elasticsearch features.

Show current indices

http://localhost:9200/_cat/indices

Output

yellow open test-index                   OeILxjmFRhODhztu4GgE_w 1 1    1 0   4.5kb   4.5kb
yellow open twitter                      BUmjGbLpTNCbnDIPxht9vA 1 1    1 0   3.7kb   3.7kb
yellow open bookindex                    RKq8oJKvRb2HQHxuPZdEbw 1 1    1 0   5.4kb   5.4kb
green  open kibana_sample_data_ecommerce FkD1obNSSK6mfLDsy9ILPQ 1 0 4675 0   4.9mb   4.9mb
green  open .kibana_task_manager_1       pc_LXegKQWu9690vT1Z-pA 1 0    2 1  16.9kb  16.9kb
yellow open facebook                     4Wzax6UhThm5rXi03PiG7w 1 1    1 0   3.5kb   3.5kb
green  open .apm-agent-configuration     sxoXMmsoS4mRPyjcaPByzw 1 0    0 0    283b    283b
green  open .kibana_1                    pK7pO-SCSBORonZLDi8Vew 1 0   60 2 949.5kb 949.5kb

This and below mentioned commands can be run from kibana, postman, browser or cURL

Curl

$ curl "localhost:9200/_cat/indices"
yellow open ecommpediatest               R6WypBc2RtCS_ITA40_rFw 1 1    0 0    283b    283b
yellow open test-index                   OeILxjmFRhODhztu4GgE_w 1 1    1 0   4.5kb   4.5kb
yellow open twitter                      BUmjGbLpTNCbnDIPxht9vA 1 1   11 5  26.9kb  26.9kb
yellow open bookindex                    RKq8oJKvRb2HQHxuPZdEbw 1 1    1 0   5.4kb   5.4kb
green  open .kibana_task_manager_1       pc_LXegKQWu9690vT1Z-pA 1 0    2 1  16.9kb  16.9kb
green  open kibana_sample_data_ecommerce FkD1obNSSK6mfLDsy9ILPQ 1 0 4675 0   4.9mb   4.9mb
green  open .apm-agent-configuration     sxoXMmsoS4mRPyjcaPByzw 1 0    0 0    283b    283b
yellow open facebook                     4Wzax6UhThm5rXi03PiG7w 1 1    2 0     7kb     7kb
green  open .kibana_1                    pK7pO-SCSBORonZLDi8Vew 1 0   60 2 945.9kb 945.9kb
yellow open twitter1                     3iMzcYoJR3qY9JiM2sY8_g 1 1    1 0   4.5kb   4.5kb

Kibana

 

Postman

 

Browser

How to create new index

PUT /ecommpediatest

same can e achieved by create new document command

PUT /twitter1/_doc/1
{
"user" : "kimchy updated",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}

Creating or updating record

------------------- Get records
GET twitter/_doc/1
HEAD twitter/_doc/1
GET twitter/_source/10
HEAD twitter/_source/1
-------------------update API
PUT /twitter/_doc/1
{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}

POST /twitter/_doc/2
{
"user" : "tom",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
PUT /twitter/_create/2
{
"user" : "Jerry",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
POST /twitter/_create/
{
"user" : "cartoon",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
-------------------delete
DELETE /twitter/_doc/1
-------------------- create
PUT /twitter/_doc/1
{
"user" : "one",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}

POST /twitter/_doc/
{
"user" : "one",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}

PUT /twitter/_create/aa
{
"user" : "xxx",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
POST /twitter/_create/ss
{
"user" : "ssss",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
------------------------------------------------------search
GET /_search
{
"query": {
"ids" : {
"values" : ["1", "ss", "100"]
}
}
}
GET /twitter/_search
{
"query": {
"ids" : {
"values" : [11,"ss"]
}
}
}
GET /twitter/_search
{
"query": {
"match_all": {}
}
}

GET /twitter/_search
{
"query": {
"match": { "user":"ssss"}
}
}

 

 

 

How to Install and Configure Elasticsearch on Ubuntu

Elasticsearch is a distributed, open source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. Elasticsearch is built on Apache Lucene and was first released in 2010 by Elasticsearch N.V. . Elasticsearch is the central component of the Elastic Stack, a set of open source tools for data ingestion, enrichment, storage, analysis, and visualization. Commonly referred to as the ELK Stack (after Elasticsearch, Logstash, and Kibana).

Elasticsearch is known for its simple REST APIs, distributed nature, speed, and scalability. You can use http methods like get , post, put, delete in combination with an url to manipulate your data.

Prerequisites

Elasticsearch is developed in java and you need to have java installed on your Ubuntu. Use following command to check if java is installed and its version.

$ java --version
openjdk 11.0.5 2019-10-15
OpenJDK Runtime Environment (build 11.0.5+10-post-Ubuntu-0ubuntu1.118.04)
OpenJDK 64-Bit Server VM (build 11.0.5+10-post-Ubuntu-0ubuntu1.118.04, mixed mode, sharing)

Now check if  $JAVA_HOME variable is configured by checking its content by following command. If it returns blank, you need to set it up. Check post How to set Java environment path $JAVA_HOME in Ubuntu for setting up $JAVA_HOME

$ echo $JAVA_HOME
/usr/java/java-1.11.0-openjdk-amd6

Setting up Elasticsearch

Packages is signed with the Elasticsearch Signing Key. Download and install the public signing key:

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

You may need to install the apt-transport-https package on Ubuntu before proceeding

$sudo apt-get install apt-transport-https
Reading package lists... Done
Building dependency tree 
Reading state information... Done
apt-transport-https is already the newest version (1.6.12).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded./code>

Now Save the repository definition to /etc/apt/sources.list.d/elastic-7.x.list:

$ echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
deb https://artifacts.elastic.co/packages/7.x/apt stable main

You can install the Elasticsearch Debian package with:

sudo apt-get update 
sudo apt-get install elasticsearch

Configure Elasticsearch

Elasticsearch files are located in directory /etc/elasticsearch. There are two files

  1. elasticsearch.yml configures the Elasticsearch server settings.
  2. logging.yml it provides configuration for logging

Let us update elasticsearch.yml

sudo nano /etc/elasticsearch/elasticsearch.yml

There are multiple variables, however you need to update following variables

  • cluster.name : specifies the name of the cluster
  • node.name: specifies the name of the server (node) . Please note the double inverted commas.

Please do not change network.host. You might see that in some posts people have changed network.host 0.0.0.0 or 127.0.0.1. If you change network.host, The cluster coordination algorithm has changed in 7.0 610 and in order to be safe it requires some specific configuration. This is relaxed when you bind to localhost only, but if/when you change network.host elasticsearch enforces that your configure the cluster safely.

# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: mycluster1
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: "node-1"
#
# Add custom attributes to the node:
#
#node.attr.rack: r1


...

# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#

Launch Elasticsearch

You can start and stop Elasticsearch using following command

sudo systemctl start elasticsearch.service
sudo systemctl stop elasticsearch.service

Testing Elasticsearch

You can check connectivity using curl. Install curl if you dont have it on your machine and run following command

$ curl -X GET "http://localhost:9200/?pretty"
{
  "name" : "node-1",
  "cluster_name" : "mycluster1",
  "cluster_uuid" : "Q6GqVVJfRZO6KSHQ-pFbcQ",
  "version" : {
    "number" : "7.5.1",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "3ae9ac9a93c95bd0cdc054951cf95d88e1e18d96",
    "build_date" : "2019-12-16T22:57:37.835892Z",
    "build_snapshot" : false,
    "lucene_version" : "8.3.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}