data.grandlyon.com indexer
This collection of Python scripts allows one to index into Elasticsearch both the metadata and the data available from the data.grandlyon.com platform. Metadata are obtained from the GeoNetwork "q" API; when available, data are obtained from a PostGIS database, in the GeoJSON format. For the time being, only geographical data stored in PostGIS are indexed. The various modules exchange information through RabbitMQ. Messages are serialized using MessagePack. Some of the scripts are not at all generic, as they carry some opinions that are closely related to the data.grandlyon.com platform, in particular to the way metadata are originally entered.
The most "tedious" part of the workflow regards the heuristic detection of data types, which eventually ensures that all the data values are cast to the "smaller" data type which can represent all the values occurring within the various datasets: int
is "smaller" than float
, which is "smaller" than string
. Datetimes are detected, as well.
Some "editorial metadata" are added to raw (meta)data before actually inserting documents into Elasticsearch (cf. the "doc-indexer" module).
Here is a simplified overview of the entire workflow.
How-to
- Generate a
config.yaml
file, using theconfig.template.yaml
file as a template; customize thedocker-compose.yml
anddocker-compose-tools.yml
files, if needed. - Run
docker-compose build
- Run
docker-compose --compatibility up -d
- [optional] Run
docker-compose -f docker-compose-tools.yml up delete-queues
- [optional] Run
docker-compose -f docker-compose-tools.yml up delete-indices
- Wait !
$ curl -X GET http://<the_hostname_where_the_API_is_running>:<the_API_listening_port/uuid/<the_uuid_of_a_given_dataset|all>[?force=true]
Aliases migration
This project also include a script that allow one to migrate aliases from one instance of elasticsearch to another.
Exemple d'usage :
python tools/alias_copier.py --src-es https://<source-host>:443 --dst-es https://<destination-host>:443 --src-idx <src-index> --dst-idx <dst-index> --skip <ex: preprod>
Prefixes or suffixes to the alias with --prepend
and --append
.
It is possible to skip the copy of aliases including a particular string. The argument takes a list of strings: --skip bar foo
.
La liste complète des arguments est visible en executant la commande suivante:
python tools/alias_copier.py --help
Tests
Install pytest
pip install pytest
Run the tests
python -m pytest
TODO
- producing indexation reports out of log messages (cf. the branches
Denis_clean_full_datalogger_31Oct
andDenis_full_datalogs_Stack_October_31
) - indexing non geographical data
- rendering the code less opinionated / more generic
- removing dead code (ex.:
deferred_count
, present in various workers) - periodically cleaning up the working directory
- adding a
health
endpoint to the API, which should at least check that- no (meta)data in Elasticsearch is older than N hours/days/... (depending on the expected behaviour);
- Elasticsearch did not enter the read-only state
*N.B. should Elasticsearch enter such a state, writes can be re-enabled by issuing the following command from a shell:
$ curl -XPUT -H "Content-Type: application/json" http(s)://<es_host>:<es_port>/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'
- upgrading to Elasticsearch 7.x
- etc.