Skip to content
Snippets Groups Projects
beta-deployment.md 12.8 KiB
Newer Older
  • Learn to ignore specific revisions
  • # Deployment of the beta version
    
    The software is hosted on 5 machines, having the following hostnames and specs:
    
    * **front-web**: 7 GB RAM; 2 vCores; 50 GB SSD
    * **back-office**: 15 GB RAM; 4 vCores; 100 GB SSD
    * **es-1**: 30 GB RAM; 8 vCores; 200 GB SSD
    * **es-2**: 30 GB RAM; 8 vCores; 200 GB SSD
    * **es-3**: 30 GB RAM; 8 vCores; 200 GB SSD
    
    
    The above machines exchanges information through a private LAN: `192.168.0.0/24`; `front-web` is the only instance which is directly connected to the Internet, through its WAN interface `ens3` and public IP addresses : `51.83.13.51` (standard), `91.121.35.236` (failover).
    
    
    The following diagram provides a sketch of the various applications hosted by infrastructure: ![beta-deployment](../assets/beta-deployment.png)
    
    Deployments are performed using Gitlab CI. Details on each machine's role and configuration are provided here-below.
    
    ## front-web
    
    The **front-web** machine has the following roles:
    
    * router, firewall
    * DNS server
    * SMTP server
    * Reverse Proxy
    
    Such roles are accomplished thanks to the configuration detailed here-below.
    
    ### router, firewall
    
    The relevant configuration is stored within the file `/etc/iptables/rules.v4`:
    
    ```
    *nat
    :PREROUTING ACCEPT [541:33128]
    :INPUT ACCEPT [333:20150]
    :OUTPUT ACCEPT [683:49410]
    :POSTROUTING ACCEPT [683:49410]
    -A POSTROUTING -s 192.168.0.0/24 -o ens3 -j MASQUERADE
    
    -A POSTROUTING -o ens3 -j SNAT --to-source 91.121.35.236
    
    COMMIT
    
    *filter
    :INPUT DROP [173:7020]
    :FORWARD ACCEPT [2218:856119]
    :OUTPUT ACCEPT [5705:2627050]
    -A INPUT -s 192.168.0.0/24 -m comment --comment "FULL ACCESS LAN" -j ACCEPT
    -A INPUT -i lo -m comment --comment "FULL ACCESS LOOPBACK" -j ACCEPT
    -A INPUT -s 217.182.252.78/32 -p tcp -m tcp --dport 22 -m comment --comment "SSH neogeo-ansible" -j ACCEPT
    -A INPUT -s 80.12.88.99/32 -p tcp -m tcp --dport 22 -m comment --comment "SSH neogeo-bureau" -j ACCEPT
    -A INPUT -s 213.245.116.190/32 -p tcp -m tcp --dport 22 -m comment --comment "SSH erasmes" -j ACCEPT
    -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -m comment --comment "in order to receive responses to outgoing requests" -j ACCEPT
    -A INPUT -d 51.83.13.51/32 -i ens3 -p tcp -m tcp --dport 443 -j ACCEPT
    -A INPUT -d 51.83.13.51/32 -i ens3 -p tcp -m tcp --dport 80 -j ACCEPT
    
    -A INPUT -d 91.121.35.236/32 -i ens3 -p tcp -m tcp --dport 443 -j ACCEPT
    -A INPUT -d 91.121.35.236/32 -i ens3 -p tcp -m tcp --dport 80 -j ACCEPT
    
    COMMIT
    ```
    
    Moreover, the following line must appear in the `/etc/sysctl.conf` file:
    
    `net.ipv4.ip_forward=1`
    
    ### DNS server
    
    We rely on the `dnsmasq` software, which was installed via `apt`. The relevant configuration is stored in `/etc/dnsmasq.conf` file, which reads as follows:
    ```
    domain-needed
    bogus-priv
    server=213.186.33.99
    listen-address=192.168.0.59
    no-dhcp-interface=ens4
    bind-interfaces
    ```
    
    The following lines were appended to the `/etc/hosts` file, allowing the DNS to resolve the entire infrastructure:
    ```
    51.83.13.51     front-web.wan
    192.168.0.59    front-web.lan
    
    51.83.15.2      back-office.wan
    192.168.0.146   back-office.lan
    
    51.68.115.202   es-1.wan
    192.168.0.74    es-1.lan
    
    51.77.229.85    es-2.wan
    192.168.0.65    es-2.lan
    
    51.83.13.94     es-3.wan
    192.168.0.236   es-3.lan
    
    ```
    
    The WAN interfaces were declared in spite of the fact that they are not actually used (except for the `front-web` instance).
    
    It is important to note that, by default, the `/etc/hosts` file is managed by the hosting service. In order to prevent user modifications from being reset at every reboot, a line has to be modified in the `/etc/cloud/cloud.cfg` file:
    
    `manage_etc_hosts: false`
    
    
    ### SMTP server
    
    `postfix` and `opendkim` were installed through `apt`. The latter was setup following the instructions found at [https://wiki.debian.org/opendkim](https://wiki.debian.org/opendkim). In particular, the following commands were issued as `root`:
    
    ```
    mkdir /etc/postfix/dkim/
    opendkim-genkey -D /etc/postfix/dkim/ -d data.beta.grandlyon.com -s mail
    chgrp opendkim /etc/postfix/dkim/*
    chmod g+r /etc/postfix/dkim/*
    chmod o= /etc/postfix/dkim/*
    ```
    
    Moreover,
    
    * the line "Mode sv" was uncommented in `/etc/opendkim.conf` (for unknown reasons :-()
    * the following lines were appended to the same file:
    
    	```
    	# Specify the list of keys
    	KeyTable file:/etc/postfix/dkim/keytable
    
    	# Match keys and domains. To use regular expressions in the file, use refile: instead of file:
    	SigningTable refile:/etc/postfix/dkim/signingtable
    
    	# Match a list of hosts whose messages will be signed. By default, only localhost is considered as internal host.
    	InternalHosts refile:/etc/postfix/dkim/trustedhosts
    	```
    * the line starting with `Socket` was modified as follows:
    
    	```
    	Socket                  inet:8892@localhost
    	```
    
    Some other files were edited:
    
    * `/etc/postfix/dkim/keytable`:
    
    	```mail._domainkey.data.beta.grandlyon.com data.beta.grandlyon.com:mail:/etc/postfix/dkim/mail.private```
    
    * `/etc/postfix/dkim/signingtable`:
    
    	```*@data.beta.grandlyon.com mail._domainkey.data.beta.grandlyon.com```
    
    * `/etc/postfix/dkim/trustedhosts`:
    
    	```
    	127.0.0.1
    	192.168.0.0/24
    	```
    
    The relevant lines in the `postfix` configuration file (`/etc/postfix/main.cf`) read as follows:
    
    ```
    [...]
    myhostname = data.beta.grandlyon.com
    alias_maps = hash:/etc/aliases
    alias_database = hash:/etc/aliases
    myorigin = /etc/mailname
    mydestination = $myhostname, data.beta.grandlyon.com, front-web.localdomain, localhost.localdomain, localhost
    relayhost =
    mynetworks = 127.0.0.0/8 [::ffff:127.0.0.0]/104 [::1]/128 192.168.0.0/24
    mailbox_size_limit = 0
    recipient_delimiter = +
    inet_interfaces = all
    inet_protocols = ipv4
    [...]
    milter_default_action = accept
    milter_protocol = 6
    smtpd_milters = inet:127.0.0.1:8892
    non_smtpd_milters = $smtpd_milters
    [...]
    ```
    
    The DNS records was updated as follows:
    ```
    data.beta.grandlyon.com. 86400	IN	TXT	"v=spf1 +ip4:51.83.13.51 ~all"
    ```
    
    ```
    mail._domainkey.data.beta.grandlyon.com. 86400 IN TXT "v=DKIM1; h=sha256; k=rsa; " "p=MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAzzoL8dvkfhm3xCpGxW8COUIgmw4r0PV/5GSUekCA8sLGPiqNh8//Jj4tFpLK6eUMacKYPbL4goUdRyTF5gqh/MdEWwafodZczELETRcp3a7mGdmM2nDhD6lk2Xtdf+nS+HWobYN18a3abNFchcF62LJWGTd4fwKV8gOIIuvTiakVxFuC7eIBUO+7m0JU0EnnivLUabphFSL3yV" "hEdpCD3csRGedSnG6+ocpZw25ll8/5f6WZnobU2d5KKqk7MVgOFXfuJMhdjmd6UvSGPaxR+/E+PsxQCU0f9vLG4R8fLPLh0ngNGGiyNYGHB5Sn8VxIrxqpH2pQKaJsfHLK/IgRJwIDAQAB"
    ```
    
    in order to implement the Sender Policy Framework (SPF). The public key can be found in the file `/etc/postfix/dkim/mail.txt`.
    
    ### Reverse Proxy
    
    `nginx` was installed through `apt`. The various "virtual host" configuration files can be found in the `/etc/nginx/sites-available` and `/etc/nginx/sites-enabled` folders. TLS certificates are stored in `/etc/nginx/ssl`.
    
    ## back-office
    
    This instance hosts both custom and off-the-shelf applications, as illustrated by the diagram displayed at the beginning of this. These applications serve several purposes:
    
    * administration, configuration
    * monitoring
    * business
    
    The public network interface (`ens3`) was deactivated, by commenting out the line
    `auto ens3` in the `/etc/network/interfaces.d/50-cloud-init.cfg`. In order for the modification to be persistent, we need to disable cloud-init's network configuration capabilities, by editing the file `/etc/cloud/cloud.cfg.d/99-disable-network-config.cfg` with the following content:
    ```
    network: {config: disabled}
    ```
    
    The private network interface (`ens4`) was statically configured. Here's the relevant lines of the `/etc/network/interfaces` file:
    
    ```
    [...]
    auto ens4
    iface ens4 inet static
    	address 192.168.0.146
    	netmask 255.255.255.0
    	gateway 192.168.0.59
      dns-nameservers 192.168.0.59
    [...]
    ```
    
    The `back-office` instance runs Docker and docker-compose, which were installed following the official documentation:
    
    * https://docs.docker.com/install/linux/docker-ce/debian/
    * https://docs.docker.com/compose/install/
    
    The default configuration was tweaked in order to prevent Docker from messing up with virtual networks. Here's is the content of the `/etc/docker/daemon.json` file:
    ```
    {
      "default-address-pools": [
        {
          "scope": "local",
          "base": "172.17.0.0/16",
          "size": 24
        },
        {
          "scope": "global",
          "base": "172.90.0.0/16",
          "size": 24
        }
      ]
    }
    ```
    
    Moreover, the content of the file `/etc/systemd/system/docker.service.d/startup_options.conf` was edited as follows,
    ```
    [Service]
    ExecStart=
    ExecStart=/usr/bin/dockerd -H fd:// -H tcp://192.168.0.146:2375
    ```
    in order to make the Docker Daemon listen to a TCP socket, instead of the default Unix socket. This allows Portainer to connect to the Docker Daemons running on the various Docker-enabled instances of the infrastructure (cf. https://success.docker.com/article/how-do-i-enable-the-remote-api-for-dockerd).
    
    ## es-1, es-2, es-3
    
    These three instances host some distributed applications:
    
    * Elasticsearch
    * Kong (backed by the Cassandra database)
    * MinIO
    
    Moreover,
    
    * they collect and parse HTTP logs via Filebeat and Logstash, respectively, which are then sent to a "small" Elasticsearch instance which is running on the `back-office` machine for monitoring purposes;
    * they store (cold) backups of the configuration of the entire infrastructure, as well as some of the relevant application data. Backups are performed by `rsnapshot`, which was installed via `apt`. Its setup requires the following steps:
    
    1. `rsync` needs be installed on all the instances of the infrastructure
    2. a public SSH key owned by the `root` user of each `es-X` instance must be appended to the `/root/.ssh/authorized_keys` of all the other instances
    3. a first SSH session from each `es-X` instance to all the others must be established, in order to answer "yes" to the question concerning the authenticity of the host we wish to connect to
    4. the `/etc/rsnapshot.conf` file must be customized according to our needs. Here's the copy of the relevant lines that can be found on `es-1`:
    
    	```
    	[...]
    
    	cmd_ssh /usr/bin/ssh
    
    	[...]
    
    	retain	hourly	6
    	retain	daily	7
    	retain	weekly	4
    	retain	monthly	3
    
    	[...]
    
    	backup  /home/  es-1/
    	backup  /etc/   es-1/
    	backup  /usr/local/     es-1/
    
    	backup  root@es-2.lan:/etc/     es-2/
    	backup  root@es-2.lan:/home/    es-2/
    	backup  root@es-2.lan:/usr/local/       es-2/
    
    	backup  root@es-3.lan:/etc/     es-3/
    	backup  root@es-3.lan:/home/    es-3/
    	backup  root@es-3.lan:/usr/local/       es-3/
    
    	backup  root@back-office.lan:/etc/      back-office/
    	backup  root@back-office.lan:/home/     back-office/
    	backup  root@back-office.lan:/usr/local/        back-office/
    	backup  root@back-office.lan:/var/local/docker-apps/    back-office/
    
    	backup  root@front-web.lan:/etc/        front-web/
    	backup  root@front-web.lan:/home/       front-web/
    	backup  root@front-web.lan:/usr/local/  front-web/
    	```
    
    	N.B.: `rsnapshot` loves (hates) tabs (blank spaces)
    
    **The `es-1`, `es-2`, `es-3` instances share the same network and Docker (+ docker-compose) configuration as the `back-office` instance.**
    
    ## Additional notes
    
    The following software packages are installed on all the machines (via `apt`):
    
    * `resolvconf`
    * `prometheus-node-exporter`
    
    On the `back-office` and `es-{1,2,3}` instances, `gitlab-runner` was installed following the [official documentation]( https://docs.gitlab.com/runner/install/linux-repository.html). Gitlab Runners were then registered as "group runners" associated with the following group: https://gitlab.alpha.grandlyon.com/groups/refonte-data/deployment-beta. The following tags were used
    * data-beta-grandlyon-com-back-office
    * data-beta-grandlyon-com-es-1
    * data-beta-grandlyon-com-es-2
    * data-beta-grandlyon-com-es-3
    in order to be able to trigger CI jobs only on selected machines.
    
    
    ## Critical points and potential improvements
    
    1. **The `front-web` instance is the SPOF of the infrastructure. How to cope with it? Shall we use an HA instance ? If not, how to set up an infrastructure with two routers??**
    2. Despite the periodic backups that we let `rsnapshot` perform, in case of failure data/service restoration would take a non-negligible amount of time. Some applications are already deployed in High Availability mode:
    	* Kong, thanks to the Cassandra cluster
    	* Elasticsearch, which stores both the (meta)data related to datasets and the editorial content (edited from within the Ghost CMS application)
    
    	Some others, hosted by the `back-office` instance are not yet distributed/replicated, but could be in the near future:
    
    	* by deploying the stateless services (mail, AUTHN, CSV catalog download, single page app, ...) on `es-{1,2,3}`;
    	* by deploying PostgreSQL (needed by the "organizations" and "resources wizard" services) in master-slave mode, the slaves being hosted by `es-{1,2,3}` and the master by `back-office` (N.B.: writes to the database come from the "Admin GUI" service);
    	* by deploying Redis (needed by the "legacy AUTH middleware" service) in HA mode, cf. https://redis.io/topics/sentinel.
    
    	N.B.: It's not such a big deal to leave the administration tools (Konga, Portainer, pgAdmin, Prometheus +  Elasticsearch + Grafana) unreplicated.