Saturday 14 November 2020

Elasticsearch/Kubernetes/Logstash(ELK) - Part 5

File Beats

Whether you're collecting from security devices, cloud, containers, hosts, or OT, Filebeat helps you keep the simple things simple by offering a lightweight way to forward and centralize logs and files.(Easlticsearch)

Reference: 
 
How does the file beat work?

How to install filebeat?
From your kibana console -> Logging -> Apache Metrics -> Download the filebeats and configure.


I have modified the logstash file so that input would be using the filebeats ..
cat apache-filebeat.conf

input
{
        beats {
                port => 5044
        }
}

filter
{
        grok{
                match => {
                        "message" => "%{COMBINEDAPACHELOG}"
                }
        }
        mutate{
                convert => { "bytes" => "integer" }
        }
        date {
                match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
                locale => en
                remove_field => "timestamp"
        }
        geoip {
                source => "clientip"
        }
        useragent {
                source => "agent"
                target => "useragent"
        }
}

output
{
        stdout {
                codec => dots
        }
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
  }
}

Start the logstash server from a new terminal,  search for the logs which would listen to the beats..

C:\elk\logstash>bin\logstash.bat -f C:\elk\data\apache-filebeat.conf
Sending Logstash's logs to C:/elk/logstash/logs which is now configured via log4j2.properties
[2020-11-14T17:18:12,918][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2020-11-14T17:18:13,368][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.3.2"}
[2020-11-14T17:18:16,372][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2020-11-14T17:18:16,784][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}}
[2020-11-14T17:18:16,784][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://localhost:9200/, :path=>"/"}
[2020-11-14T17:18:16,971][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://localhost:9200/"}
[2020-11-14T17:18:17,018][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2020-11-14T17:18:17,018][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[2020-11-14T17:18:17,049][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["http://localhost:9200"]}
[2020-11-14T17:18:17,065][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2020-11-14T17:18:17,096][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2020-11-14T17:18:17,252][INFO ][logstash.filters.geoip   ] Using geoip database {:path=>"C:/elk/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-geoip-5.0.3-java/vendor/GeoLite2-City.mmdb"}

[2020-11-14T17:18:17,811][INFO ][logstash.inputs.beats    ] Beats inputs: Starting input listener {:address=>"0.0.0.0:5044"}
[2020-11-14T17:18:17,827][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x7758f1a6 run>"}
[2020-11-14T17:18:17,936][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2020-11-14T17:18:17,943][INFO ][org.logstash.beats.Server] Starting server on port: 5044
[2020-11-14T17:18:18,208][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}


It's now time to modify the filebeat configuration. 
config file: filebeat.yml
Change these in the config file..

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:\elk\data\logs\*


Comment out from elastic search and un-comment logstash

#----------------------------- Logstash output --------------------------------
output.logstash:
  hosts: ["localhost:5044"]


Save and Quit, start the filebeat server. 
Open a new terminal and execute below command
elk/filebeat>filebeat.exe



Go to the Kibana console and check on the Management, you would find all the index building done by the logstash.



All your data is being sent from the host using filebeat to the logstash server. 
That's all for ELK.



Elasticsearch/Kubernetes/Logstash(ELK) - Part 4

Installations of Logstash with Kibana

Install the logstash, and start the service. You can test if its working fine using the below command,  
bin/logstash -e '{ input stdin{}}' '{ output stdout{}}'

This means whatever you type would be taken as input to logstash and would output the same on the output. 

I would be taking an Apache log file and would create a logstash config file where it would load all the data that has to be searched and loaded as an index into kibana. 

Place this file in your elk folder, create another folder as data and place this file over there.

Reference:

input
{
        file {
                path => "C:\elk\data\logs\logs"
                type => "logs"
                start_position => "beginning"
        }
}

filter
{
        grok{
                match => {
                        "message" => "%{COMBINEDAPACHELOG}"
                }
        }
        mutate{
                convert => { "bytes" => "integer" }
        }
        date {
                match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
                locale => en
                remove_field => "timestamp"
        }
        geoip {
                source => "clientip"
        }
        useragent {
                source => "agent"
                target => "useragent"
        }
}

output
{
        stdout {
                codec => dots
        }
        elasticsearch {
        }
}

Download the apache log(https://github.com/elastic/elk-index-size-tests/blob/master/logs.gz) from whatever source you have mentioned in the above document(C:\elk\data\logs\logs) and start the logstash server. All these logs would be read from the elasticsearch as the output is destined over there. we could see the same from the kibana console.

C:\elk\logstash>bin\logstash.bat -f C:\elk\data\apache.conf
Sending Logstash's logs to C:/elk/logstash/logs which is now configured via log4j2.properties
[2020-11-14T09:41:21,710][INFO ][logstash.setting.writabledirectory] Creating directory {:setting=>"path.queue", :path=>"C:/elk/logstash/data/queue"}
[2020-11-14T09:41:21,722][INFO ][logstash.setting.writabledirectory] Creating directory {:setting=>"path.dead_letter_queue", :path=>"C:/elk/logstash/data/dead_letter_queue"}
[2020-11-14T09:41:21,880][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2020-11-14T09:41:21,927][INFO ][logstash.agent           ] No persistent UUID file found. Generating new UUID {:uuid=>"ec39c09a-712e-4d86-a9d8-ab629546e04f", :path=>"C:/elk/logstash/data/uuid"}
[2020-11-14T09:41:22,721][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.3.2"}
[2020-11-14T09:41:27,029][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2020-11-14T09:41:27,751][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://127.0.0.1:9200/]}}
[2020-11-14T09:41:27,767][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://127.0.0.1:9200/, :path=>"/"}
[2020-11-14T09:41:28,095][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://127.0.0.1:9200/"}
[2020-11-14T09:41:28,189][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2020-11-14T09:41:28,189][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[2020-11-14T09:41:28,236][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//127.0.0.1"]}
[2020-11-14T09:41:28,251][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2020-11-14T09:41:28,298][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2020-11-14T09:41:28,392][INFO ][logstash.outputs.elasticsearch] Installing elasticsearch template to _template/logstash
[2020-11-14T09:41:28,801][INFO ][logstash.filters.geoip   ] Using geoip database {:path=>"C:/elk/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-geoip-5.0.3-java/vendor/GeoLite2-City.mmdb"}
[2020-11-14T09:41:30,949][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x58a3a982 run>"}
[2020-11-14T09:41:31,074][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2020-11-14T09:41:31,851][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
.....................................................................................................
.
.
Once these dots are completed, you can close this window. You can test the same in the kibana console that you have all the indexes from the logstash..
[ ctrl-c]

..................................................................[2020-11-14T10:46:36,690][WARN ][logstash.runner          ] SIGINT received. Shutting down.
[2020-11-14T10:46:38,334][INFO ][logstash.pipeline        ] Pipeline has terminated {:pipeline_id=>"main", :thread=>"#<Thread:0x58a3a982 run>"}
Terminate batch job (Y/N)? y


C:\elk\logstash>



you can also check from the elasticsearch url

Now, login to kibana, and go to Discover and create an index pattern using @timestamp (select range from 01-June-2014 to 20-July-2014)and you see the diagram below. Once the datas is loaded you will need to get a playaround, creating your own visuals and those being presented in the Dashboard.



We can create metrics of various fields and add into a single dashboard. 
Created Pie-Charts, Bar Charts, & Geo-location graphs into the dashboard for the total requests during the mentioned time line.



We could write more of the data and create more dashboards to provide more insights.

Friday 13 November 2020

Elasticsearch/Kubernetes/Logstash(ELK) - Part 3

Elastic Search Query & Practice

Everything in Elasticsearch is HTTP and the body uses the JSON format to query the Elasticsearch. This language to query elasticsearch is DSL which will match your search criteria and provide some kind of relevancy score, more occurrences of the words in the document is considered to be more relevant(relevancy score).

Syntax for DSL components

Query: Query context which is used for full text searches and that's supposed to match documents that contain the search criteria as well to specify how well the document matched that particular search criteria by providing that relevancy score.

Filter: Filter context is mostly used for filtering structured data  

But both the query context and the filter context can be combined together to form one large query.

Index Creation and Query using DSL

Create these below 10 records and get hands on how to use query Elasticsearch. This is mainly for practice, create it one by one in the kibal console which you can practice for querying the searches.

PUT /courses/classroom/1
{
    "name": "Accounting 101",
    "room": "E3",
    "professor": {
        "name": "Thomas Baszo",
        "department": "finance",
        "facutly_type": "part-time",
        "email": "baszot@onuni.com"
        },
    "students_enrolled": 27,
    "course_publish_date": "2015-01-19",
    "course_description": "Act 101 is a course from the business school on the introduction to accounting that teaches students how to read and compose basic financial statements"
}

PUT /courses/classroom/2
{
    "name": "Marketing 101",
    "room": "E4",
    "professor": {
        "name": "William Smith",
        "department": "finance",
        "facutly_type": "part-time",
        "email": "wills@onuni.com"
        },
    "students_enrolled": 18,
    "course_publish_date": "2015-06-21",
    "course_description": "Mkt 101 is a course from the business school on the introduction to marketing that teaches students the fundamentals of market analysis, customer retention and online advertisements"
}

PUT /courses/classroom/3
{
    "name": "Anthropology 230",
    "room": "G11",
    "professor": {
        "name": "Devin Cranford",
        "department": "history",
        "facutly_type": "full-time",
        "email": "devinc@onuni.com"
        },
    "students_enrolled": 22,
    "course_publish_date": "2013-08-27",
    "course_description": "Ant 230 is an intermediate course on human societies and cultures and their development. A focus on the Mayans civilization is rooted in this course"
}

PUT /courses/classroom/4
{
    "name": "Computer Science 101",
    "room": "C12",
    "professor": {
        "name": "Gregg Payne",
        "department": "engineering",
        "facutly_type": "full-time",
        "email": "payneg@onuni.com"
        },
    "students_enrolled": 33,
    "course_publish_date": "2013-08-27",
    "course_description": "CS 101 is a first year computer science introduction teaching fundamental data structures and alogirthms using python. "
}

PUT /courses/classroom/5
{
    "name": "Theatre 410",
    "room": "T18",
    "professor": {
        "name": "Sebastian Hern",
        "department": "art",
        "facutly_type": "part-time",
        "email": ""
        },
    "students_enrolled": 47,
    "course_publish_date": "2013-01-27",
    "course_description": "Tht 410 is an advanced elective course disecting the various plays written by shakespere during the 16th century"
}

PUT /courses/classroom/6
{
    "name": "Cost Accounting 400",
    "room": "E7",
    "professor": {
        "name": "Bill Cage",
        "department": "accounting",
        "facutly_type": "full-time",
        "email": "cageb@onuni.com"
        },
    "students_enrolled": 31,
    "course_publish_date": "2014-12-31",
    "course_description": "Cst Act 400 is an advanced course from the business school taken by final year accounting majors that covers the subject of business incurred costs and how to record them in financial statements"
}

PUT /courses/classroom/7
{
    "name": "Computer Internals 250",
    "room": "C8",
    "professor": {
        "name": "Gregg Payne",
        "department": "engineering",
        "facutly_type": "part-time",
        "email": "payneg@onuni.com"
        },
    "students_enrolled": 33,
    "course_publish_date": "2012-08-20",
    "course_description": "cpt Int 250 gives students an integrated and rigorous picture of applied computer science, as it comes to play in the construction of a simple yet powerful computer system. "
}

PUT /courses/classroom/8
{
    "name": "Accounting Info Systems 350",
    "room": "E3",
    "professor": {
        "name": "Bill Cage",
        "department": "accounting",
        "facutly_type": "full-time",
        "email": "cageb@onuni.com"
        },
    "students_enrolled": 19,
    "course_publish_date": "2014-05-15",
    "course_description": "Act Sys 350 is an advanced course providing students a practical understanding of an accounting system in database technology. Students will use MS Access to build a transaction ledger system"
}

PUT /courses/classroom/9
{
    "name": "Tax Accounting 200",
    "room": "E7",
    "professor": {
        "name": "Thomas Baszo",
        "department": "finance",
        "facutly_type": "part-time",
        "email": "baszot@onuni.com"
        },
    "students_enrolled": 17,
    "course_publish_date": "2016-06-15",
    "course_description": "Tax Act 200 is an intermediate course covering various aspects of tax law"
}

PUT /courses/classroom/10
{
    "name": "Capital Markets 350",
    "room": "E3",
    "professor": {
        "name": "Thomas Baszo",
        "department": "finance",
        "facutly_type": "part-time",
        "email": "baszot@onuni.com"
        },
    "students_enrolled": 13,
    "course_publish_date": "2016-01-11",
    "course_description": "This is an advanced course teaching crucial topics related to raising capital and bonds, shares and other long-term equity and debt financial instrucments"
}


The most basic kind of query in a classic search is the match or query. You can see the _score to find the relevance of the document.
There are more examples in this section for practice purposes.

GET /courses/_search
{
  "query":{
    "match_all": {}
  }
}

GET /courses/_search
{
  "query":{
    "match": {"name":"computer"}
  }
}

PUT /courses/classroom/5
{
    "name": "Theatre 410",
    "room": "T18",
    "professor": {
        "name": "Sebastian Hern",
        "department": "art",
        "facutly_type": "part-time"
        },
    "students_enrolled": 47,
    "course_publish_date": "2013-01-27",
    "course_description": "Tht 410 is an advanced elective course disecting the various plays written by shakespere during the 16th century"
}

displays all the records which exists in the field

GET /courses/_search
{
  "query":{
    "exists": {"field":"professor.email"}
  }
}

Match multiple criteria

GET /courses/_search
{
  "query":{
    "bool": {
      "must": [
        {"match": {"name":"computer"}},
        {"match": {"room": "c8"}}
      ]
    }
  }
}


Multi match criteria

GET /courses/_search
{
  "query":{
    "multi_match":{
      "fields": ["name","professor.name"],
      "query": "accounting"
    }
  }
}


Searches for the string with exact words

GET /courses/_search
{
  "query":{
    "match_phrase":{
      "course_description": "financial statements"
    }
  }
}


Searches for the words with partial contexts also.

GET /courses/_search
{
  "query":{
    "match_phrase_prefix":{
      "course_description": "financial statements"
    }
  }
}


Search in range

GET courses/_search
{
  "query": {
    "range": {
      "students_enrolled": {
        "gte": 10,
        "lte": 15
      }
    }
  }
}

Search for date

GET courses/_search
{
  "query": {
    "range": {
      "course_publish_date": {
        "gte": 2013
      }
    }
  }
}


Combine two or more queries.

GET courses/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "name": "accounting"
        }}
      ]
      , "must_not": [
        {"match": {
          "room": "e7"
        }}
      ]
      , "should": [
        {"range": {
          "students_enrolled": {
            "gte": 10,
            "lte": 20
          }
        }}
      ]
    }
  }
}


Filters

GET courses/_search
{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must":[
            {
              "range":{
                "students_enrolled":{
                  "gte": 30
                }
              }
            }
          ]
        }
      }
    }
  }
}


You can create a new set of index, to practice aggregations and filter.
Bulk Indexing

POST /vehicles/cars/_bulk
{ "index": {}}
{ "price" : 10000, "color" : "white", "make" : "honda", "sold" : "2016-10-28", "condition": "okay"}
{ "index": {}}
{ "price" : 20000, "color" : "white", "make" : "honda", "sold" : "2016-11-05", "condition": "new" }
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2016-05-18", "condition": "new" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2016-07-02", "condition": "good" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2016-08-19" , "condition": "good"}
{ "index": {}}
{ "price" : 18000, "color" : "red", "make" : "dodge", "sold" : "2016-11-05", "condition": "good"  }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2016-01-01", "condition": "new"  }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2016-08-22", "condition": "new"  }
{ "index": {}}
{ "price" : 10000, "color" : "gray", "make" : "dodge", "sold" : "2016-02-12", "condition": "okay" }
{ "index": {}}
{ "price" : 19000, "color" : "red", "make" : "dodge", "sold" : "2016-02-12", "condition": "good" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "chevrolet", "sold" : "2016-08-15", "condition": "good" }
{ "index": {}}
{ "price" : 13000, "color" : "gray", "make" : "chevrolet", "sold" : "2016-11-20", "condition": "okay" }
{ "index": {}}
{ "price" : 12500, "color" : "gray", "make" : "dodge", "sold" : "2016-03-09", "condition": "okay" }
{ "index": {}}
{ "price" : 35000, "color" : "red", "make" : "dodge", "sold" : "2016-04-10", "condition": "new" }
{ "index": {}}
{ "price" : 28000, "color" : "blue", "make" : "chevrolet", "sold" : "2016-08-15", "condition": "new" }
{ "index": {}}
{ "price" : 30000, "color" : "gray", "make" : "bmw", "sold" : "2016-11-20", "condition": "good" }

Aggregation
till now we were only searching, now we would do data analytics by insights into data.

Search all data with price in descending order.

GET /vehicles/cars/_search
{
  "from": 0,
  "size": 5

  , "query": {
      "match_all": {}
  }
  , "sort": [
    {
      "price": {
        "order": "desc"
      }
    }
  ]
}


Creating the Aggregation which searches on all the cars which are in color 'red' with price details.

GET /vehicles/cars/_search
{
  "size": 1,
  "query": {
    "match": {
      "color": "red"
    }
  },

  "aggs": {
    "popular_cars": {
      "terms": {
        "field": "make.keyword",
        "size": 10
      }
      , "aggs": {
        "avg_price": {
          "avg": {
            "field": "price"
          }
        },
          "max_price": {
          "max": {
            "field": "price"
          }
        },
          "min_price": {
          "min": {
            "field": "price"
          }
        }
      }
    }
  }
}

Creating the buckets for the sold date.

GET /vehicles/cars/_search
{
  "aggs": {
    "popular_cars": {
      "terms": {
        "field": "make.keyword",
        "size": 10
      }
      , "aggs": {
        "sold_date_range": {
          "range": {
            "field": "sold",
            "ranges": [
              {
                "from": "2016-01-01",
                "to": "2016-05-18"
              },
              {
                "from": "2016-05-18",
                "to": "2017-01-01"
              }
            ]
          },
          "aggs": {
            "avg_price": {
              "avg": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  }
}


Find the conditions of the cars, their max, min price as an aggregate scoping.

GET /vehicles/cars/_search
{
  "aggs": {
    "car_condition": {
      "terms": {
        "field": "condition.keyword",
        "size": 10
      }
      , "aggs": {
        "avg_price": {
          "avg": {
            "field": "price"
          }
        },
        "make":{
          "terms": {
            "field": "make.keyword",
            "size": 10
          },
           "aggs":{
            "min_price": { "min": { "field": "price" }},
            "max_price": { "max": { "field": "price" }}
          }
        }
      }
    }
  }
}



Elasticsearch/Kubernetes/Logstash(ELK) - Part 2

Index Creations

What happens when we send requests to index a document or query the document in a classic search ?

As you know elastic search is a distributed technology, which means it's faster to solve the given problem. So if we have a really large search problem and we break it up into really small ones and we hand them out to each one of those computers and all those computers can sort of work in parallel to solve the problem in a much shorter period of time as opposed to just one computer.

index search is actually very scalable and the logical representation for any human readable data, under the hood it's actually called as shards. So a particular index like this could be split up into multiple shards.

example,
document from 1 to 50 -> shard_0
document from 51 to 100 -> Shard_1

Shards, since we already know that these are distributed(one or more computers) so node one is one computer and Node 2 is another computer and both of them have a classic search running and they can communicate with one another, when on the same network(clustering). so smaller units of storage is called "shards"

shards are placed in either of the containers(computers) so that when actual search requests come up, any of the nodes in the cluster would be able to respond. For backup of these, you call as replica shards if required to be configured. so when creating an index, it will be an exact copy of the actual shard, this is mainly used in the production.

How does search take place ?

what node is going to do is take the ID of this document that we are trying to index and it's going to run it through something called the hashing function.

Shards
So a shard is basically a container of inverted indices also called segments. A segment belongs inside of charts, and a chart can have multiple segments and each one of these segments or inverted indices.

inverted index:
long list of words in alphabetical order and next to each word we need to put the document that word occurred the actual location for that, called token and the process of doing it is called tokenization.

And as you're aware and an elastic search index is made up of multiple shards, and that is how an index can span multiple nodes because the unit of separation is the Shard.

The process of taking raw tax and converting it and turning it into an inverted index is called analysis, so that data is searchable which makes it faster. Once it forms this inverted index then it gets into the memory buffer.

so this one document we sent it to a logic search for indexing. It went through the analysis step and inverted indexes created this document are sent to the buffer and get populated with that document, in the same way, another document to classic search goes through the analysis process and inverted indexes formed and it sends it into the buffer. Once this buffer gets filled up then this buffer gets committed to a segment. which are not called as immutable index

And this is the basis of setting your data so that it's searchable now. Once the segments are done this chart is now searchable. It's searchable and you can rest assured that the data in here is not going to be changed; it's been processed through analysis; inverted indices were formed and then they were committed to the segment. So this is permanent data that you can know you can search and that's all. Each one of those shards are formed.

So the magical process that happens here is this analysis step that's what turns a document like this into an inverted index.

Analysis Process

When we send a document into elasticsearch  goes through a process of running this analysis step. The objective of this step is to convert or transform the document into an inverted index and store it into a shard, so this inverted index gets put into a shard segment. So this process of analysis is the key in indexing documents and not only during indexing but also during Query time when we retrieve or read the documents.

Example:
we will put these two sentences into analysis process

"sentence 1"
"sentence 2"

We were indexing these two sentences. We'd need to get rid of unnecessary information so that we can get to the most important pieces of both
of these documents and only those pieces would be indexed so if we were to convert these documents into an inverted index. This is called tokenization.

It goes through this process called an analyzer, who does all of the analysis and has two parts:

- The first part is tokenization So it has a tokenizer.
- The second step is filtering.

these steps are being done at the filtering,
- Remove stop words (whitespaces)
- Lowercasing
- Stemming (eg running, run, swimming, swim)
- Synonyms (eg thin, lean, skinny)

And when the text goes into this analyzer let's say we are indexing right we are indexing the document when text goes into this analyzer it gets first tokenized and now filtering takes place the tokens that come out are the ones that make it into the inverted index. So this is the indexing step.


Define Custom Index Structure

we could define an index "hr" we deal with this particular structure whichis the logical representation
of the actual data the actual data resides on the disk called shards. We're usually concerned with the logical
representation of the index and how to load the data into index.

Now let's get into the details of how the index structure can be defined if you had to create your index
manually and define the different fields in the properties and so on. So far elastic search created an index
structure for us dynamically on the fly. Now, lets see how to manually create it.

PUT /customer
{
  "settings" : {
      "number_of_replicas" : 2,
      "number_of_shards": 1
    },
    "mappings" : {
        "online" : {
            "properties" : {
                "gender" : {
                  "type" : "text",
                  "analyzer": "standard"
                },
                "age": {
                  "type": "integer"
                },
                "total_spent": {
                  "type": "float"
                }
            }
        }
    }
}

Lets create data to our index, as what we defined earlier.

PUT /customer/online/2343
{
  "gender": "male",
  "age": 22,
  "total_spent": 50000,
  "location": "Kashmir"
}

notice that when we use GET /customer all the data, even though what ever we have mentioned while defining
index, elastic search was added itself dynamically with "location".

we could restrict Elasticsearch, to set value of dynamic
- false: indexing field will be ignored.
- strict: indexing field will throw error.

PUT /customer/_mapping/online
{
  "dynamic": "strict"
}

Analyzers

Elastic search has wide range of built-in analyzers, which can be used in any index without further configurations.
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/analysis-analyzers.html

Thursday 12 November 2020

Elasticsearch/Kubernetes/Logstash(ELK) - Part 1

In this article we will learn about ELK. 

Installing Elasticsearch & Kibana:
As part of the prerequisite, ensure you have installed Java.

Download Elastic and Kibana from below release page

unzip the downloaded file for both.
cd kibana*
vim config/kibana.yml
search for elasticsearch.url  in the config file and uncomment the line. 
it would be default pointed to elasticsearch at localhost:9200

Running Elasticsearch and Kibana
First always start the elasticsearch, bin/elasticsearch.sh
second, start kibana, bin/kibana.sh

Open your browser and point to urls,
Elasticsearch: http://localhost:9200 

As part of practice working ELK, we can populate data into a classic search, retrieve data, and delete data.
In the elastic search rolled data is stored into something called an index.

We would be taking an example of HR index and will create an index called hr and will store employee type and each employee would be created with an id.
e.g
<index>/<type>/<name>
/hr/employee/xyz

PUT /hr/employees/sunil
{
  "Name": "Sunil",
  "EmpID": "123"
}


Returns, the success code of the API call.
HEAD /hr/employees/sunil

Retrieve, data
GET /hr/employees/sunil

Update data,
POST /hr/employees/sunil/_update
{
  "doc":{
    "Location": "Bengaluru"
  }
}


whenever data is being written it would not just change the attributes, instead the document itself.

Delete data,
DELETE /hr/employees/sunil
The deletion only did on the attribute on the call, however the index still remains.

DELETE /hr

Index Components

GET /business
You won't have any index hence it returns error, we will try to create a new index.

PUT /business/building/200
{
  "address": "498 Dave Street In",
  "floors": 3,
  "offices": 5,
  "loc": {
    "latitude": 23.2332,
    "longitute": 34.23233
  }
}


GET /business
You would get the below output which has main componets as
aliases, mappings, settings.

So when we try to add more records into the search with different fields, elastics would map itself to the mapping section.
elastic search dynamic

{
  "business": {
    "aliases": {},
    "mappings": {},
    "settings": {}
  }
}

PUT /business/building/201
{
  "address": "498 Dave Street In",
  "floors": 3,
  "offices": 5,
  "price": 5000000,
  "loc": {
    "latitude": 23.2332,
    "longitute": 34.23233
  }
}


Note: we could only have 1 type in the index.
e.g PUT /business/employees/232, this would give an error as /business is already associated with "buildings"
so you can crate in this way, with new Index

PUT /employees/_doc/200
{
  "Name": "Sunil",
  "title": "Senior Engineer",
  "joining_data": "Jan 01 2020"
}

PUT /employees/_doc/201
{
  "Name": "Ram",
  "title": "Senior Tech Engineer",
  "joining_data": "Jul 01 2000"
}


PUT /contracts/_doc/220
{
  "Name": "System Admins",
  "start_date": "Jan 10 2015",
  "employees": [200, 201]
}


Query data

GET business/building/_search
          or
GET business/_search


Search and get only the required record
GET business/_search
{
  "query": {
    "term": {
      "address": "498"
    }
  }
}


Actual request for elastic search which goes from kibana console would be like below,

curl -X GET "http://localhost:9200/business/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": {
      "address": "498"
    }
  }
}'


We will further check on the next parts on text analysis for indexing and searching etc.


Sunday 8 November 2020

Prometheus Monitoring - Part 5

My previous articles mainly discussed regarding the alerts, alerts manager, and notifications. 

In this article, we will include Prometheus as a datasource from Grafana for data visualisations. 
  • Install & Configure Grafana
    • Install nginx
    • Install & Configure Nginx 
    • Configure Nginx reverse proxy
    • Configure SSL
    • Register to DNS
  • Setup Prometheus DataSource
  • Setup Prometheus Dashboards
  • Create dashboards for node exporters  
Install & Configure Grafana
I am installing Grafana using Ubuntu 20.04, with a root account.

sudo apt update
sudo apt-get install -y adduser libfontconfig1
wget https://dl.grafana.com/oss/release/grafana_7.2.0_amd64.deb
sudo dpkg -i grafana_7.2.0_amd64.deb
sudo service grafana-server start
sudo systemctl enable grafana-server.service


Your Grafana server will be hosted at http://[your Grafana server ip]:3000
The default Grafana login is
Username :admin
Password :admin

If you need to have an SSL, installing an nginx proxy would be fine and then configure reverse proxy to redirect accordingly.

sudo apt install nginx -y
sudo vim /etc/nginx/sites-enabled/prometheus


server {
    listen 80;
    listen [::]:80;
    server_name  prometheus.YOUR-DOMAIN-NAME;

    location / {
proxy_pass           http://localhost:3000/;
    }
}


Save and test the new configuration has no errors
nginx -t

http://YOUR-DOMAIN-NAME
Visiting your ip address directly will still show the default Nginx welcome page. you can remove( rm /etc/nginx/sites-enabled/default )

restart nginx,
sudo service nginx restart
sudo service nginx status


Add SSL certificates to the grafana dashboards.

sudo snap install --classic certbot
sudo certbot --nginx

Once those certs are installed, you can use https://grafana.domainname.com to login from the browser.



Setup Prometheus DataSource

Once you logged into the grafana, go to "Configurations" → Click on "DataSources" → Select "Prometheus" .
Configurations required over here have to be filled up.




Go to the explore tabs in which Prometheus is already selected, and run the query "go_threads". 



Setup Prometheus Dashboards

Go to Prometheus Configurations → Datasources → Click on the datasources which you have created → select Dashboards → Prometheus 2.0 Stats → Click on Import.



Create dashboards for node exporters

Configurations section choose → Plugins → Click on "Find more plugins on Grafana.com" → Select "Dashboards" → Select [ English version ] → copy ID : 11074

Grafana Web page, select "Manage" from Dashboards → Select "Import" → Paste the ID : "11074"




Prometheus Monitoring - Part 4

In previous post we discussed about PromQL and Alerts
https://sunlnx.blogspot.com/2020/11/promotheus-monitoring-part-3.html

In this article, we will discuss these

  • Prometheus Alert Manager
  • Configure SMTP local on Prometheus Server
  • Test Alerts

Prometheus Alert Manager

The AlertManager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or OpsGenie. It also takes care of silencing and inhibition of alerts.

Install the Prometheus Alert Manager

sudo apt install prometheus-alertmanager
sudo service prometheus-alertmanager status
ps -u prometheus

Note that the service is running on port 9093

Visit http://[your domain name or ip]:9093/


Block Port 9093 for external requests

iptables -A INPUT -p tcp -s localhost --dport 9093 -j ACCEPT
iptables -A INPUT -p tcp --dport 9093 -j DROP
iptables -L

iptables-save > /etc/iptables/rules.v4
iptables-save > /etc/iptables/rules.v6
tail -4 prometheus.yml
- job_name: alertmanager
      static_configs:
        - targets: ['localhost:9093']

verify the config and restart the service. Once they are successful, it must display the target in the Prometheus UI.



Configure SMTP for Alerts

Setup a simple local SMTP server which can only send emails from localhost.

sudo apt install mailutils
sudo vim /etc/postfix/main.cf

Go to the End of the line,
inet_interfaces = loopback-only
inet_protocols = ipv4


sudo systemctl restart postfix

Make sure your forward and reverse looks fine, otherwise it is very likely that email providers don't think this would be a valid email and you won't receive any emails.
Once verified, fire below command from the prometheus server,

echo "This is the body" | mail -s "This is the subject" -a "FROM:admin@yourdomainname" your@email-address

check your mail account, you would have received an email..

configure the Alert Manager process to send emails when the alerting rules fire and resolve.
cd /etc/prometheus
cp alertmanager.yml alertmanager_orig.yml
cat >  alertmanager.yml
[ctrl-d]
Add the below contents and configure your alerts

route:

  receiver: smtp-local
receivers:
  - name: 'smtp-local'
    email_configs:
    - to: 'sunlnx@gmail.com'
      from: 'promoalertadmin@devtestlabs.in'
      require_tls: false
      #auth_username: 'alertmanager'
      #auth_password: 'password'
      #auth_secret: 'secret'
      #auth_identity: 'identity'
      smarthost: localhost:25
      send_resolved: true

  Now, you would have received an alert as the state is in Firing.  




Source mentioned in the email are w.r.t to localhost and we would configure it to use the prometheus source, to change it
sudo vim /etc/default/prometheus
ARGS="--web.enable-admin-api --web.external-url=https://example.com"


restart your prometheus server to take effect.
systemctl restart prometheus

Thanks.

Saturday 7 November 2020

Prometheus Monitoring - Part 3

Please check my previous posts as we have discussed in detail over scrape targets installations and configuration.

We would be discussing the below items on the Prometheus.
  • PromQL Queries
  • Saving/Recording Rules
  • Alerting Rules
PromQL Queries

The query language used in Prometheus is called PromQL (Prometheus Query Language). The data can either be viewed as a graph, as tabled data, or in external systems such as Grafana, Zabbix and others.

Simple examples

node_cpu_seconds_total{}
node_cpu_seconds_total{instance="promonode01.devtestlabs.in:80"} 

Regular Expressions
list only nodes with specific domains.

node_cpu_seconds_total{instance=~".*.devtestlabs.*"} 
node_cpu_seconds_total{instance=~".*.devtestlabs.*",mode=~".*irq*"}

Data Types

Scalar
A numeric floating point value

Instant vector
A set of time series containing a single sample for each time series.

Range Vector
A set of time series containing a range of data points over time for each time series.

node_cpu_seconds_total{instance=~".*.devtestlabs.*",mode=~".*irq*"}[1m]
node_cpu_seconds_total{instance=~".*.devtestlabs.*",mode=~".*irq*"}[5m]

Functions/Subfunctions
  • Start with this instant vector node_netstat_Tcp_InSegs{instance="localhost:9100"}
  • Convert it to a Range Vector and then convert it back to an instant vector using rate rate(node_netstat_Tcp_InSegs{instance="localhost:9100"}[1m])
  • Wrap it in the ceiling function ceil(rate(node_netstat_Tcp_InSegs{instance="localhost:9100"}[1m]))
  • Convert it to a range vector and get the per-second derivative of the time series deriv(ceil(rate(node_netstat_Tcp_InSegs{instance="localhost:9100"}[1m]))[1m:])
More Info: https://prometheus.io/docs/prometheus/latest/querying/functions/

Saving Rules
To run a query every time would be difficult, hence we can store the resultant query, Custom rules can be created and saved in the config file. 

Examples
Memory available percentage: 100 - (100 * node_memory_MemFree_bytes / node_memory_MemTotal_bytes))
Root Disk Space: 100 * node_filesystem_free_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}

Login to your prometheus server and configure the rules as below

vim /etc/prometheus/prometheus_rules.yml
groups:
  - name: custom_rules
    rules:
      - record: node_memory_free_percent
        expr: 100 - (100 * node_memory_MemFree_bytes / node_memory_MemTotal_bytes)
      - record: node_filesystem_free_percent
        expr: 100 * node_filesystem_free_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}


check for syntax,
promtool check rules prometheus_rules.yml
Checking prometheus_rules.yml
  SUCCESS: 2 rules found


Include these rules in the main prometheus configurations.
vim /etc/prometheus/prometheus.yml
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
  - "prometheus_rules.yml"


check for the syntax of the main config file,

promtool check config prometheus.yml
Checking prometheus.yml
  SUCCESS: 1 rule files found

Checking prometheus_rules.yml
  SUCCESS: 2 rules found


since all are fine, restart prometheus server,
systemctl restart prometheus

You would now be able to see the custom rules  in the dashboard.
 


Alerts
We will create a new group named alert_rules and add in the same config file which created earlier rules.
vim /etc/promotheus/prometheus_rules.yml
  - name: alert_rules
    rules:
      - alert: InstanceDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} down"
          description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute."

      - alert: DiskSpaceFree10Percent
        expr: node_filesystem_free_percent <= 10
        labels:
          severity: warning
        annotations:
          summary: "Instance {{ $labels.instance }} has 10% or less Free disk space"
          description: "{{ $labels.instance }} has only {{ $value }}% or less free."


check for the syntax of the config file and restart the service when there are SUCCESS message.

 promtool check config prometheus.yml
Checking prometheus.yml
  SUCCESS: 1 rule files found

Checking prometheus_rules.yml
  SUCCESS: 3 rules found


sudo service prometheus restart
sudo service prometheus status