3º. 2º cuatrimestre. Itinerario de Tecnologías de la Información. Grado en Ingeniería Informática. Curso 2019/2020
Elasticsearch es un motor de búsqueda:
es una librería que implementa un full-text search engine. No es una aplicación sino una API que da capacidades de búsqueda.
An inverted index (also referred to as a postings file or inverted file) is a database index storing a mapping from content, such as words or numbers, to its locations in a table, or in a document or a set of documents (named in contrast to a forward index, which maps from documents to content). The purpose of an inverted index is to allow fast full-text searches, at a cost of increased processing when a document is added to the database.
La idea es parecida a los índices de referencias cruzadas que habitualmente aparecen al final de los libros.
Shard: Es un fragmento de un índice. An index is divided into one or more shards to make the data distributable. Shards can be stored on a single node or multiple nodes and are composed of Lucene segments.
Analizadores de texto: Procesadores de texto que realizan transformaciones del contenido de los diferentes campos para permitir funcionalidades adicionales de búsqueda
### Notes
Elasticsearch is built on Java 8.
Instructions on how to install Java 8 are available on Oracle’s website
You can run java -version
from the command line to confirm that Java is installed and ready.
$ java --version
java 9.0.4
Java(TM) SE Runtime Environment (build 9.0.4+11)
Java HotSpot(TM) 64-Bit Server VM (build 9.0.4+11, mixed mode)
Una forma de instalarse ElasticSearch es ir a la página de descargas:
La versión que se usa en el libro es la 5.2 que se puede descargar desde aquí:
Aquí se puede encontrar una guía de inicio rápido..
Esta es la versión que he usado en mi instalación, la 6.4.2 para seguir el libro a finales de 2018 y comienzos de 2019:
$ elasticsearch --version
Java HotSpot(TM) 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
Version: 6.4.2, Build: default/tar/04711c2/2018-09-26T13:34:09.098244Z, JVM: 9.0.4
Once you download the archive,
unzip it
and run bin/elasticsearch
from the command line.
You should see a lot of output containing something like the following (much of the output is omitted here for brevity).
$ bin/elasticsearch
[INFO ][o.e.n.Node ] [] initializing ...
... many lines omitted ...
[INFO ][o.e.h.HttpServer ] [kAh7Q7Z] publish_address {127.0.0.1:9200},
bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[INFO ][o.e.n.Node ] [kAh7Q7Z] started
[INFO ][o.e.g.GatewayService ] [kAh7Q7Z] recovered [0] indices into
cluster_state
Notice the publish_address
and bound_addresses
listed toward the end of the output.
By default, Elasticsearch binds TCP port 9200 for its HTTP endpoint.
You can specify a lot of settings when setting up an Elasticsearch cluster. By default, is running in development mode.
A full discussion of the Elasticsearch cluster settings for version 5.2 is Elastic’s Important System Configuration 5.2 page. The same instructions for the current version are here
To have Elasticsearch in the PATH
, I have added a small script in my ~/.bash_profile
:
[~/campus-virtual/1819/ca1819/practicas(master)]$ cat ~/.bash_profile | sed -ne '/elastic/,/^$/p'
source ~/bin/elasticsearch-set
With this contents:
[~/campus-virtual/1819/ca1819/practicas(master)]$ cat ~/bin/elastic-search-set
export ES_HOME=~/Applications/elasticsearch-6.4.2
export PATH=$ES_HOME/bin:$PATH
La version en Diciembre de 2019 es la 7.5.0
Apuntes tomados de https://www.elastic.co/guide/en/elasticsearch/reference/current/brew.html
Elastic publishes Homebrew formulae so you can install Elasticsearch with the Homebrew package manager.
To install with Homebrew, you first need to tap the Elastic Homebrew repository:
brew tap elastic/tap
Once you’ve tapped the Elastic Homebrew repo, you can use brew install
to install the default distribution of Elasticsearch:
[~/.../transforming-data-and-testing-continuously-chapter-5/databases(master)]$ brew install elastic/tap/elasticsearch-full
Updating Homebrew...
==> Auto-updated Homebrew!
Updated 1 tap (homebrew/core).
==> Updated Formulae
allure bedtools c-blosc convox csvq golang-migrate helmfile micronaut mitmproxy
==> Installing elasticsearch-full from elastic/tap
==> Downloading https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.5.0-darwin-x86_64.tar.gz?tap=elastic/homebrew-tap
######################################################################## 100.0%
==> codesign -f -s - /usr/local/Cellar/elasticsearch-full/7.5.0/libexec/modules/x-pack-ml/platform/darwin-x86_64/controller.app --deep
==> Caveats
Data: /usr/local/var/lib/elasticsearch/elasticsearch_casiano/
Logs: /usr/local/var/log/elasticsearch/elasticsearch_casiano.log
Plugins: /usr/local/var/elasticsearch/plugins/
Config: /usr/local/etc/elasticsearch/
To have launchd start elastic/tap/elasticsearch-full now and restart at login:
brew services start elastic/tap/elasticsearch-full
Or, if you don't want/need a background service you can just run:
elasticsearch
==> Summary
🍺 /usr/local/Cellar/elasticsearch-full/7.5.0: 921 files, 451.1MB, built in 1 minute 44 seconds
Type | Description | Default Location | Setting |
---|---|---|---|
home |
Elasticsearch home directory or |
|
|
bin |
Binary scripts including |
|
|
conf |
Configuration files including |
|
|
data |
The location of the data files of each index / shard allocated on the node. Can hold multiple locations. |
|
|
logs |
Log files location. |
|
|
plugins |
Plugin files location. Each plugin will be contained in a subdirectory. |
|
This installs the most recently released default distribution of Elasticsearch. To install the OSS distribution, specify elastic/tap/elasticsearch-oss
.
$ which elasticsearch
/usr/local/bin/elasticsearch
$ elasticsearch --version
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
Version: 7.5.0, Build: default/tar/e9ccaed468e2fac2275a3761849cbee64b39519f/2019-11-26T01:06:52.518245Z, JVM: 13.0.1
$ elasticsearch
...
[2019-12-18T09:52:26,489][INFO ][o.e.t.TransportService ] [sanclemente-2.local] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
...
[2019-12-18T09:52:28,853][INFO ][o.e.h.AbstractHttpServerTransport] [sanclemente-2.local] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
...
[2019-12-18T09:52:58,833][WARN ][o.e.c.r.a.DiskThresholdMonitor] [sanclemente-2.local] high disk watermark [90%] exceeded on [VK6QoFsVQeGBAcAKIC3vLA][sanclemente-2.local][/usr/local/var/lib/elasticsearch/nodes/0] free: 19.3gb[8.2%], shards will be relocated away from this node
Para arreglar el WARN
he editado el fichero de configuración elasticsearch.yml
añadiendo la línea cluster.routing.allocation.disk.watermark.high: 95%
:
[.../etc/elasticsearch]$ sed -ne '/cluster\./p' elasticsearch.yml
# the most important settings you may want to configure for a production cluster.
cluster.name: elasticsearch_casiano
cluster.routing.allocation.disk.watermark.high: 95%
#cluster.initial_master_nodes: ["node-1", "node-2"]
Aunque ahora salen otros warnings y algun INFO
quejumbroso:
...
[2019-12-18T10:26:03,369][WARN ][o.e.b.BootstrapChecks ] [sanclemente-2.local] the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured
...
[2019-12-18T10:27:04,078][INFO ][o.e.c.r.a.DiskThresholdMonitor] [sanclemente-2.local] low disk watermark [85%] exceeded on [VK6QoFsVQeGBAcAKIC3vLA][sanclemente-2.local][/usr/local/var/lib/elasticsearch/nodes/0] free: 19.2gb[8.2%], replicas will not be assigned to this node
Si visitamos con el navegador http://localhost:9200
:
curl localhost:9200/_cat
da una serie de endpoints
.../etc/elasticsearch]$ curl localhost:9200/_cat
=^.^=
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/tasks
/_cat/indices
/_cat/indices/{index}
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/thread_pool/{thread_pools}
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}
/_cat/nodeattrs
/_cat/repositories
/_cat/snapshots/{repository}
/_cat/templates
Modo verboso: $ curl localhost:9200/_cat/master?v
`
id host ip node
VK6QoFsVQeGBAcAKIC3vLA 127.0.0.1 127.0.0.1 sanclemente-2.local
Help: $ curl localhost:9200/_cat/master?help
`
id | | node id
host | h | host name
ip | | ip address
node | n | node name
Each of the commands accepts a query string parameter h
which forces only those columns to appear: curl localhost:9200/_cat/nodes?h=ip,port,heapPercent,name
[.../etc/elasticsearch]$ curl localhost:9200/_cat/nodes
127.0.0.1 28 99 24 2.56 dilm * sanclemente-2.local
[.../etc/elasticsearch]$ curl localhost:9200/_cat/nodes?help | head -n 5
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 17533 100 17533 0 0 903k 0 --:--:-- --:--:-- --:--:-- 951k
id | id,nodeId | unique node id
pid | p | process id
ip | i | ip address
port | po | bound transport port
http_address | http | bound http address
[.../etc/elasticsearch]$ curl localhost:9200/_cat/nodes?h=ip,port,heapPercent,name
127.0.0.1 9300 28 sanclemente-2.local
Let us see where elasticsearch
6.4.2 is installed:
[~]$ which elasticsearch
/Users/casiano/Applications/elasticsearch-6.4.2/bin/elasticsearch
Let us execute elasticsearch
6.4.2 in development mode.
The flow of output when executed is overwhelming:
[~/sol-nodejs-the-right-way(master)]$ elasticsearch
[Java HotSpot(TM) 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
[2019-12-15T11:28:46,903][INFO ][o.e.n.Node ] [] initializing ...
...
[2019-12-15T11:28:53,337][INFO ][o.e.p.PluginsService ] [9jAGWs_] loaded module [aggs-matrix-stats]
[2019-12-15T11:28:53,338][INFO ][o.e.p.PluginsService ] [9jAGWs_] loaded module [analysis-common]
...
[2019-12-15T11:29:10,938][INFO ][o.e.t.TransportService ] [9jAGWs_] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
...
[2019-12-15T11:29:14,175][INFO ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [9jAGWs_] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
We can see in the last line that is listening at 9200:
[2019-12-15T11:29:14,175][INFO ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [9jAGWs_] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
Now we can use insomnia or any other HTTP REST client to make queries to the elasticsearch server:
This text is a copy of https://www.elastic.co/guide/en/kibana/current/brew.html#brew.
Elastic publishes Homebrew formulae so you can install Kibana with the Homebrew package manager.
To install with Homebrew, you first need to tap the Elastic Homebrew repository:
brew tap elastic/tap
Once you’ve tapped the Elastic Homebrew repo, you can use brew install
to install the default distribution of Kibana:
$ brew install elastic/tap/kibana-full
Updating Homebrew...
==> Installing kibana-full from elastic/tap
==> Downloading https://artifacts.elastic.co/downloads/kibana/kibana-7.5.0-darwin-x86_64.tar.gz?tap=elastic/homebrew-tap
######################################################################## 100.0%
==> Caveats
Config: /usr/local/etc/kibana/
If you wish to preserve your plugins upon upgrade, make a copy of
/usr/local/opt/kibana-full/plugins before upgrading, and copy it into the
new keg location after upgrading.
To have launchd start elastic/tap/kibana-full now and restart at login:
brew services start elastic/tap/kibana-full
Or, if you don't want/need a background service you can just run:
kibana
==> Summary
🍺 /usr/local/Cellar/kibana-full/7.5.0: 94,615 files, 633.7MB, built in 8 minutes 18 seconds
This installs the most recently released default distribution of Kibana. To install the OSS distribution, specify elastic/tap/kibana-oss
.
When you install Kibana with brew install
, the config files, logs, and data directory are stored in the following locations.
Type | Description | Default Location | Setting |
---|---|---|---|
home |
Kibana home directory or |
|
|
bin |
Binary scripts including |
|
|
conf |
Configuration files including |
|
|
data |
The location of the data files of each index / shard allocated on the node. Can hold multiple locations. |
|
|
logs |
Log files location. |
|
|
plugins |
Plugin files location. Each plugin will be contained in a subdirectory. |
|
Creamos para la versión 7.5. de Elasticsearch el index de libros de Guttenberg con el fichero que habíamos preparado en la práctica anterior:
[~/.../t3-p8-commanding-databases-marreA/esclu(master)]$ ./esclu bulk ../t1-p7-transforming-data-and-testing-continuously-marreA/data/bulk_pg.ldj -i books -t book
Una vez instalado Kibana lo arrancamos:
[.../etc/elasticsearch]$ kibana
log [11:08:38.205] [info][plugins-system] Setting up [15] plugins: [timelion,features,code,security,licensing,spaces,uiActions,newsfeed,expressions,inspector,embeddable,advancedUiActions,data,eui_utils,translations]
...
log [11:08:38.225] [warning][config][plugins][security] Generating a random key for xpack.security.encryptionKey. To prevent sessions from being invalidated on restart, please set xpack.security.encryptionKey in kibana.yml
log [11:08:38.227] [warning][config][plugins][security] Session cookies will be transmitted over insecure connections. This is not recommended.
...
log [11:09:12.590] [warning][licensing][plugins] License information could not be obtained from Elasticsearch for the [data] cluster. Error: Request Timeout after 30000ms
log [11:09:13.610] [warning][legacy-plugins] Skipping non-plugin directory at /usr/local/Cellar/kibana-full/7.5.0/libexec/src/legacy/core_plugins/visualizations
...
log [11:09:20.088] [warning][config][deprecation] Environment variable "DATA_PATH" will be removed. It has been replaced with kibana.yml setting "path.data"
...
log [11:09:24.245] [warning][encrypted_saved_objects] Generating a random key for xpack.encrypted_saved_objects.encryptionKey. To be able to decrypt encrypted saved objects attributes after restart, please set xpack.encrypted_saved_objects.encryptionKey in kibana.yml
... from failing on restart, please set xpack.reporting.encryptionKey in kibana.yml
log [11:09:29.003] [info][status][plugin:reporting@7.5.0] Status changed from uninitialized to green - Ready
log [11:09:29.151] [info][listening] Server running at http://localhost:5601
log [11:09:29.917] [info][server][Kibana][http] http server running at http://localhost:5601
Por defecto Kibana corre en el puerto 5601.
Abrimos el navegador en http://localhost:5601 y hacemos click en las herramientas de desarrollo (la llave inglesa) en el menú de la izquierda. Esto nos abre un panel como este en el que podemos hacer requests al servidor de Elasticsearch:
Algunos ejemplos de queries:
GET _cat/indices?v
GET books/_search
{
"query": {
"match": {
"authors": "Twain"
}
}
}
GET books/_search
{
"query": {
"query_string": {
"query": "authors:Twain AND subjects:Missouri AND title:Sawyer"
}
}
}
GET books/_search
{
"query": {
"query_string": {
"fields": ["authors", "subjects", "title"],
"query": "Twain AND Missouri AND Sawyer"
}
}
}
POST test/test/1
{
"title": "hello world"
}
GET test/test/1
POST test/_doc/2
{
"title": "hola mundo"
}
GET test/_doc/2
PUT test/_doc/2
{
"title" : "bonjour monde",
"tags" : ["red", "blue"]
}
PUT test/_doc/2
{
"tags" : ["green", "orange"]
}
POST test/_doc/3
{
"title" : "SYTWS",
"tags" : ["red", "blue"]
}
POST test/_update/3
{
"script" : {
"source": "ctx._source.tags = params.colors",
"params" : {
"colors" : ["green"]
}
}
}
GET test/_doc/3
DELETE test/