Desarrollo de Sistemas Informáticos

3º. 2º cuatrimestre. Itinerario de Tecnologías de la Información. Grado en Ingeniería Informática. Curso 2019/2020


Organization ULL-ESIT-DSI-1920   Github Classroom DSI   Campus Virtual DSI   Profesores Casiano , Vicente , Manz

Table of Contents

Elasticseach

Elasticsearch es un motor de búsqueda:

Lucene

es una librería que implementa un full-text search engine. No es una aplicación sino una API que da capacidades de búsqueda.

Índices Invertidos

An inverted index (also referred to as a postings file or inverted file) is a database index storing a mapping from content, such as words or numbers, to its locations in a table, or in a document or a set of documents (named in contrast to a forward index, which maps from documents to content). The purpose of an inverted index is to allow fast full-text searches, at a cost of increased processing when a document is added to the database.

indices-invertidos.png

La idea es parecida a los índices de referencias cruzadas que habitualmente aparecen al final de los libros.

Funcionalidades aportadas por Lucene y Funcionalidades aportadas por Elasticsearch

/assets/images/lucene-vs-elasticsearch.png

Términos usuales

Elasticsearch is built on Java 8.

Instructions on how to install Java 8 are available on Oracle’s website

You can run java -version from the command line to confirm that Java is installed and ready.

$ java --version
java 9.0.4
Java(TM) SE Runtime Environment (build 9.0.4+11)
Java HotSpot(TM) 64-Bit Server VM (build 9.0.4+11, mixed mode)

Instalación de la versión del libro de ElasticSearch

Una forma de instalarse ElasticSearch es ir a la página de descargas:

La versión que se usa en el libro es la 5.2 que se puede descargar desde aquí:

Aquí se puede encontrar una guía de inicio rápido..

Instalación de la versión 6.4.2. Octubre 2018

Esta es la versión que he usado en mi instalación, la 6.4.2 para seguir el libro a finales de 2018 y comienzos de 2019:

$ elasticsearch --version
Java HotSpot(TM) 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
Version: 6.4.2, Build: default/tar/04711c2/2018-09-26T13:34:09.098244Z, JVM: 9.0.4

Once you download the archive, unzip it and run bin/elasticsearch from the command line.

You should see a lot of output containing something like the following (much of the output is omitted here for brevity).

$ bin/elasticsearch
[INFO ][o.e.n.Node ] [] initializing ...
... many lines omitted ...
[INFO ][o.e.h.HttpServer ] [kAh7Q7Z] publish_address {127.0.0.1:9200},
    bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[INFO ][o.e.n.Node            ] [kAh7Q7Z] started
[INFO ][o.e.g.GatewayService  ] [kAh7Q7Z] recovered [0] indices into
    cluster_state

Notice the publish_address and bound_addresses listed toward the end of the output. By default, Elasticsearch binds TCP port 9200 for its HTTP endpoint.

You can specify a lot of settings when setting up an Elasticsearch cluster. By default, is running in development mode.

A full discussion of the Elasticsearch cluster settings for version 5.2 is Elastic’s Important System Configuration 5.2 page. The same instructions for the current version are here

To have Elasticsearch in the PATH, I have added a small script in my ~/.bash_profile:

[~/campus-virtual/1819/ca1819/practicas(master)]$ cat ~/.bash_profile | sed -ne '/elastic/,/^$/p'
source ~/bin/elasticsearch-set

With this contents:

[~/campus-virtual/1819/ca1819/practicas(master)]$ cat ~/bin/elastic-search-set
export ES_HOME=~/Applications/elasticsearch-6.4.2
export PATH=$ES_HOME/bin:$PATH

Instalación de la versión 7.5.0 Diciembre 2019

La version en Diciembre de 2019 es la 7.5.0

Install Elasticsearch on macOS with Homebrew. Diciembre 2019

Apuntes tomados de https://www.elastic.co/guide/en/elasticsearch/reference/current/brew.html

Elastic publishes Homebrew formulae so you can install Elasticsearch with the Homebrew package manager.

To install with Homebrew, you first need to tap the Elastic Homebrew repository:

brew tap elastic/tap

Once you’ve tapped the Elastic Homebrew repo, you can use brew install to install the default distribution of Elasticsearch:

[~/.../transforming-data-and-testing-continuously-chapter-5/databases(master)]$ brew install elastic/tap/elasticsearch-full
Updating Homebrew...
==> Auto-updated Homebrew!
Updated 1 tap (homebrew/core).
==> Updated Formulae
allure              bedtools            c-blosc             convox              csvq                golang-migrate      helmfile            micronaut           mitmproxy

==> Installing elasticsearch-full from elastic/tap
==> Downloading https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.5.0-darwin-x86_64.tar.gz?tap=elastic/homebrew-tap
######################################################################## 100.0%
==> codesign -f -s - /usr/local/Cellar/elasticsearch-full/7.5.0/libexec/modules/x-pack-ml/platform/darwin-x86_64/controller.app --deep
==> Caveats
Data:    /usr/local/var/lib/elasticsearch/elasticsearch_casiano/
Logs:    /usr/local/var/log/elasticsearch/elasticsearch_casiano.log
Plugins: /usr/local/var/elasticsearch/plugins/
Config:  /usr/local/etc/elasticsearch/

To have launchd start elastic/tap/elasticsearch-full now and restart at login:
  brew services start elastic/tap/elasticsearch-full
Or, if you don't want/need a background service you can just run:
  elasticsearch
==> Summary
🍺  /usr/local/Cellar/elasticsearch-full/7.5.0: 921 files, 451.1MB, built in 1 minute 44 seconds

Directory layout for Homebrew installs

Type Description Default Location Setting

home

Elasticsearch home directory or $ES_HOME

/usr/local/var/homebrew/linked/elasticsearch-full

 

bin

Binary scripts including elasticsearch to start a node and elasticsearch-plugin to install plugins

/usr/local/var/homebrew/linked/elasticsearch-full/bin

 

conf

Configuration files including elasticsearch.yml

/usr/local/etc/elasticsearch

ES_PATH_CONF

data

The location of the data files of each index / shard allocated on the node. Can hold multiple locations.

/usr/local/var/lib/elasticsearch

path.data

logs

Log files location.

/usr/local/var/log/elasticsearch

path.logs

plugins

Plugin files location. Each plugin will be contained in a subdirectory.

/usr/local/var/homebrew/linked/elasticsearch/plugins

 

This installs the most recently released default distribution of Elasticsearch. To install the OSS distribution, specify elastic/tap/elasticsearch-oss.

Running Elasticsearch 7.5.0

$ which elasticsearch
/usr/local/bin/elasticsearch
$ elasticsearch --version
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
Version: 7.5.0, Build: default/tar/e9ccaed468e2fac2275a3761849cbee64b39519f/2019-11-26T01:06:52.518245Z, JVM: 13.0.1
$ elasticsearch
...
[2019-12-18T09:52:26,489][INFO ][o.e.t.TransportService   ] [sanclemente-2.local] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
...
[2019-12-18T09:52:28,853][INFO ][o.e.h.AbstractHttpServerTransport] [sanclemente-2.local] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
...
[2019-12-18T09:52:58,833][WARN ][o.e.c.r.a.DiskThresholdMonitor] [sanclemente-2.local] high disk watermark [90%] exceeded on [VK6QoFsVQeGBAcAKIC3vLA][sanclemente-2.local][/usr/local/var/lib/elasticsearch/nodes/0] free: 19.3gb[8.2%], shards will be relocated away from this node

Para arreglar el WARNhe editado el fichero de configuración elasticsearch.yml añadiendo la línea cluster.routing.allocation.disk.watermark.high: 95%:

[.../etc/elasticsearch]$ sed -ne '/cluster\./p' elasticsearch.yml 
# the most important settings you may want to configure for a production cluster.
cluster.name: elasticsearch_casiano
cluster.routing.allocation.disk.watermark.high: 95%
#cluster.initial_master_nodes: ["node-1", "node-2"]

Aunque ahora salen otros warnings y algun INFO quejumbroso:

...
[2019-12-18T10:26:03,369][WARN ][o.e.b.BootstrapChecks    ] [sanclemente-2.local] the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured
... 
[2019-12-18T10:27:04,078][INFO ][o.e.c.r.a.DiskThresholdMonitor] [sanclemente-2.local] low disk watermark [85%] exceeded on [VK6QoFsVQeGBAcAKIC3vLA][sanclemente-2.local][/usr/local/var/lib/elasticsearch/nodes/0] free: 19.2gb[8.2%], replicas will not be assigned to this node

Ruta root Elasticsearch

Si visitamos con el navegador http://localhost:9200:

/assets/images/elasticsearch-root-page-9200.png

La ruta _cat

Running ElasticSearch 6.4.2

Let us see where elasticsearch 6.4.2 is installed:

[~]$ which elasticsearch
/Users/casiano/Applications/elasticsearch-6.4.2/bin/elasticsearch

Let us execute elasticsearch 6.4.2 in development mode. The flow of output when executed is overwhelming:

[~/sol-nodejs-the-right-way(master)]$ elasticsearch
[Java HotSpot(TM) 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
[2019-12-15T11:28:46,903][INFO ][o.e.n.Node               ] [] initializing ...
  ...
[2019-12-15T11:28:53,337][INFO ][o.e.p.PluginsService     ] [9jAGWs_] loaded module [aggs-matrix-stats]
[2019-12-15T11:28:53,338][INFO ][o.e.p.PluginsService     ] [9jAGWs_] loaded module [analysis-common]
  ...
  
 [2019-12-15T11:29:10,938][INFO ][o.e.t.TransportService   ] [9jAGWs_] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
  ...
[2019-12-15T11:29:14,175][INFO ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [9jAGWs_] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}

We can see in the last line that is listening at 9200:

[2019-12-15T11:29:14,175][INFO ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [9jAGWs_] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}

Now we can use insomnia or any other HTTP REST client to make queries to the elasticsearch server:

assets/images/insomnia-elasticsearch-1.png

Referencias para Elasticsearch

Setup Kibana

Installing Kibana

Installing Kibana on MacOS with Homebrew

This text is a copy of https://www.elastic.co/guide/en/kibana/current/brew.html#brew.

Elastic publishes Homebrew formulae so you can install Kibana with the Homebrew package manager.

To install with Homebrew, you first need to tap the Elastic Homebrew repository:

brew tap elastic/tap

Once you’ve tapped the Elastic Homebrew repo, you can use brew install to install the default distribution of Kibana:

$ brew install elastic/tap/kibana-full
Updating Homebrew...
==> Installing kibana-full from elastic/tap
==> Downloading https://artifacts.elastic.co/downloads/kibana/kibana-7.5.0-darwin-x86_64.tar.gz?tap=elastic/homebrew-tap
######################################################################## 100.0%

==> Caveats
Config: /usr/local/etc/kibana/
If you wish to preserve your plugins upon upgrade, make a copy of
/usr/local/opt/kibana-full/plugins before upgrading, and copy it into the
new keg location after upgrading.

To have launchd start elastic/tap/kibana-full now and restart at login:
  brew services start elastic/tap/kibana-full
Or, if you don't want/need a background service you can just run:
  kibana
==> Summary
🍺  /usr/local/Cellar/kibana-full/7.5.0: 94,615 files, 633.7MB, built in 8 minutes 18 seconds

This installs the most recently released default distribution of Kibana. To install the OSS distribution, specify elastic/tap/kibana-oss.

Directory layout for Homebrew installs

When you install Kibana with brew install, the config files, logs, and data directory are stored in the following locations.

Type Description Default Location Setting

home

Kibana home directory or $KIBANA_HOME

/usr/local/var/homebrew/linked/kibana-full

 

bin

Binary scripts including kibana to start a node and kibana-plugin to install plugins

/usr/local/var/homebrew/linked/kibana-full/bin

 

conf

Configuration files including kibana.yml

/usr/local/etc/kibana

 

data

The location of the data files of each index / shard allocated on the node. Can hold multiple locations.

/usr/local/var/lib/kibana

path.data

logs

Log files location.

/usr/local/var/log/kibana

path.logs

plugins

Plugin files location. Each plugin will be contained in a subdirectory.

/usr/local/var/homebrew/linked/kibana-full/plugins

 

Haciendo Consultas a Elasticsearch con Kibana

Creamos para la versión 7.5. de Elasticsearch el index de libros de Guttenberg con el fichero que habíamos preparado en la práctica anterior:

[~/.../t3-p8-commanding-databases-marreA/esclu(master)]$ ./esclu bulk ../t1-p7-transforming-data-and-testing-continuously-marreA/data/bulk_pg.ldj -i books -t book 

Una vez instalado Kibana lo arrancamos:

[.../etc/elasticsearch]$ kibana
  log   [11:08:38.205] [info][plugins-system] Setting up [15] plugins: [timelion,features,code,security,licensing,spaces,uiActions,newsfeed,expressions,inspector,embeddable,advancedUiActions,data,eui_utils,translations]
  ...
  log   [11:08:38.225] [warning][config][plugins][security] Generating a random key for xpack.security.encryptionKey. To prevent sessions from being invalidated on restart, please set xpack.security.encryptionKey in kibana.yml
  log   [11:08:38.227] [warning][config][plugins][security] Session cookies will be transmitted over insecure connections. This is not recommended.
  ...
  log   [11:09:12.590] [warning][licensing][plugins] License information could not be obtained from Elasticsearch for the [data] cluster. Error: Request Timeout after 30000ms
  log   [11:09:13.610] [warning][legacy-plugins] Skipping non-plugin directory at /usr/local/Cellar/kibana-full/7.5.0/libexec/src/legacy/core_plugins/visualizations
  ...
  log   [11:09:20.088] [warning][config][deprecation] Environment variable "DATA_PATH" will be removed.  It has been replaced with kibana.yml setting "path.data"
  ...
  log   [11:09:24.245] [warning][encrypted_saved_objects] Generating a random key for xpack.encrypted_saved_objects.encryptionKey. To be able to decrypt encrypted saved objects attributes after restart, please set xpack.encrypted_saved_objects.encryptionKey in kibana.yml
  ... from failing on restart, please set xpack.reporting.encryptionKey in kibana.yml
  log   [11:09:29.003] [info][status][plugin:reporting@7.5.0] Status changed from uninitialized to green - Ready
  log   [11:09:29.151] [info][listening] Server running at http://localhost:5601
  log   [11:09:29.917] [info][server][Kibana][http] http server running at http://localhost:5601

Por defecto Kibana corre en el puerto 5601.

Abrimos el navegador en http://localhost:5601 y hacemos click en las herramientas de desarrollo (la llave inglesa) en el menú de la izquierda. Esto nos abre un panel como este en el que podemos hacer requests al servidor de Elasticsearch:

/assets/images/kibana-query-2-elastic-search.png

Algunos ejemplos de queries:


GET _cat/indices?v


GET books/_search
{
  "query": {
    "match": { 
      "authors": "Twain" 
    }
  }
}

GET books/_search
{
  "query": {
    "query_string": {
      "query": "authors:Twain AND subjects:Missouri AND title:Sawyer" 
    }
  }
}

GET books/_search
{
  "query": {
    "query_string": {
      "fields": ["authors", "subjects", "title"], 
      "query": "Twain AND Missouri AND Sawyer" 
    }
  }
}

POST test/test/1
{
  "title": "hello world"
}

GET test/test/1

POST test/_doc/2
{
  "title": "hola mundo"
}

GET test/_doc/2

PUT test/_doc/2
{
    "title" : "bonjour monde",
    "tags" : ["red", "blue"]
}

PUT test/_doc/2
{
    "tags" : ["green", "orange"]
}


POST test/_doc/3
{
    "title" : "SYTWS",
    "tags" : ["red", "blue"]
}

POST test/_update/3
{
    "script" : {
        "source": "ctx._source.tags = params.colors",
        "params" : {
            "colors" : ["green"]
        }
    }
}

GET test/_doc/3

DELETE test/

Referencias para Kibana

Comment with Disqus