Skip to content
Snippets Groups Projects
user avatar
ddamiron authored
add filterkey/value endpoint
e97e53a4
History

data.grandlyon.com indexer

This collection of Python scripts allows one to index into Elasticsearch both the metadata and the data available from the data.grandlyon.com platform. Metadata are obtained from the GeoNetwork "q" API; when available, data are obtained from a PostGIS database, in the GeoJSON format. For the time being, only geographical data stored in PostGIS are indexed. The various modules exchange information through RabbitMQ. Messages are serialized using MessagePack. Some of the scripts are not at all generic, as they carry some opinions that are closely related to the data.grandlyon.com platform, in particular to the way metadata are originally entered.

The most "tedious" part of the workflow regards the heuristic detection of data types, which eventually ensures that all the data values are cast to the "smaller" data type which can represent all the values occurring within the various datasets: int is "smaller" than float, which is "smaller" than string. Datetimes are detected, as well.

Some "editorial metadata" are added to raw (meta)data before actually inserting documents into Elasticsearch (cf. the "doc-indexer" module).

A simplified overview of the entire workflow is provided by the attached draw.io diagram.

How-to

  1. Generate a config.yaml file, using the config.template.yaml file as a template; customize the docker-compose-workers.yml and docker-compose-tools.yml files, if needed.
  2. Run docker-compose -f docker-compose-workers.yml build
  3. Run docker-compose -f docker-compose-workers.yml up -d
  4. [optional] Run docker-compose -f docker-compose-tools.yml up delete-queues
  5. [optional] Run docker-compose -f docker-compose-tools.yml up delete-indices
  6. Run docker-compose -f docker-compose-tools.yml up setup-indices
  7. Run docker-compose -f docker-compose-tools.yml up field-type-detector
  8. Wait !
  9. Run docker-compose -f docker-compose-tools.yml up metadata-getter

To connect to metabase:

go to :3001

use the interface to create an account connect to the mongo database setting the parameters:

  1. database typde: MongoDB
  2. Name: mongo-indexer
  3. Host: mongo
  4. Database name: indexerdb
  5. Port: <mongo_port>
  6. Database username: <mongo_user_login>
  7. Database password: <mongo_user_password>

using uuid endpoint to launch main method:

http://localhost:8000/uuid/ example: http://localhost:8000/uuid/e23c6d3e-40be-4d5e-bc15-4b7e7313942f

using status end-point for filtering mongo queries:

/status/<session_id>/<query_key>/<query_value> example: /status/bb0764fe-324e-4719-9214-dc4abc59fe50/step/doc-processor

TODO

  • adding an HTTP API to trigger indexation
  • producing indexation reports
  • indexing non geographical data
  • rendering the code less opinionated / more generic
  • testing, testing, testing, ...
  • etc.