Recovery
Recovery procedure for AAP
When the configuration of an entire Ansible Automation Platform installation is lost,
you can reinstall it and then click everything together by hand, but then it will take
weeks and there will be no equal configuration. In very few cases, everything is so
well documented that you can snap everything back together in a UI to create the
same situation as before the crash. "Configuration As Code" has just been devised
and created for this.
Now, after the reinstallation, we can run all the configuration as code again. But even
that is quite a time-consuming job in a large environment, since all configuration is
housed in separate repositories, this can become a time-consuming task to do all this
by hand. To prevent this, we have developed this recovery procedure, which reduces
manual work to a minimum.
A requirement for correct operation is that every organization that is known in
RHAAP has a repository in gitlab with the correct name and content. Also, a correct
version must be merged with the target environment that needs to be restored,
otherwise the recovery of that organization will fail.
Steps to a full recovery
A complete recovery of an environment consists of the following steps, which should
preferably be carried out automatically:
1. Restoring a namespace on Openshift (or a VM)
2. Installing and running the operator for AAP (installation playbook)
3. Restoring the configuration of automation hub using config as code
4. Restoring any custom collections
5. Restore all custom execution environments
6. Obtaining the token (manually in automation hub)
7. Running the config as code of AAP
8. Running the config as code for all teams
The automation of recovery
Of course, it would be useful to be able to start this recovery from a different environment of AAP. In this way, the loss of one environment can be recovered from another, still running environment. Since we have housed all configuration as code in GIT, with pipelines that configure it in AAP without the intervention of other systems. Can we start the recovery by triggering these pipelines in the right order. Since GIT is the executor and not AAP, there is no need to arrange access between AAP environments.
The big picture
In order to restore the environment, we had already seen that the necessary steps must be taken to achieve full recovery. We're not going to go into the recovery of the installation of RHAAP here, in this repository we are going to pick up the recovery from the moment AAP is reinstalled and ready to be configured.
This involves the following (automated) steps:
1. Automation Hub Recovery
2. Obtaining the API token
3. RHAAP proof automated Configuration Recovery
If you want to trigger a pipeline from outside via the gitlab API, you need to create a
separate gitlab token for each pipeline and keep it in the play in which you want to
run this trigger. That amounts to a lot of administration, where the tokens will also
expire. We certainly don't want this, we want to be able to assume that once our code
has been started, it will continue until everything is executed. That won't be the case
if new tokens are continuously requested. Then there is no longer any real automatic
recovery.
To prevent this, it is possible in gitlab to have projects trigger each other, as if it were
a dependency, no registered token is needed. A project gets such a token
implicitly.this functionality is intended for dependencies, but we are going to abuse it
for the recovery process, because it saves a lot of administration.
What are we going to do?
1. We are making a new gitlab project for part recovery
2. We write a pipeline with a dependency on the gitlab project we want to execute (or projects)
3. Pushing it to gitlab
4. Waiting for the pipeline to be executed
5. Delete this temporary project
We do this for every recovery step. For the recovery steps, we create job template(s) in RHAAP with a survey for each environment. We only do this for the MGT (management) organization, so that only system administrators have access. We have brought the recovery steps together in a repository, where the steps are recorded in separate playbooks. We are going to explain these here.
Generic data
Both playbooks need some configuration data that defines the environment they need to function in. Since it's almost the same for both, we've merged it into 1 file: env_vars.yml
---
# put your vars in here and make sure this file is ALWAYS vault encrypted
# the values in this file will be encrypted and used in the config files.
gitlab_protocol: 'https://'
# ensure the gitlab_url has the final "/"
gitlab_url: 'gitlab.example.com/'
gitlab_user: <username_gitlab_svc_account>
gitlab_password: <passwd_gitlab_svc_account>
gitlab_group: 'CaC'
aap_base_config: aap-base-config
org_project_name_prefix: aap-config-
ahub_project_name: run_ahub_recovery
aap_project_name: run_aap_recovery
# List repositories to run the pipeline for, including the gitlab group
# name.
repositories:
- CaC/cac-ahub-config
# Examples for additional repositories:
#- images/ee_cac_image
#- collections/linux.infra
#- collections/linux.rhel
controllers:
dev:
host: controller.dev.example.com
passwd: <admin password>
test:
host: controller.test.example.com
passwd: <admin password>
accp:
host: controller.accp.example.com
passwd: <admin password>
prod:
host: controller.prod.example.com
passwd: <admin password>
Above is the data that makes the operation within the environment possible. First of all, the gitlab environment is defined, the user data is important, in gitlab a user must be created who has sufficient rights to be able to perform this. Creating and using such a user prevents someone from having to use their personal credentials for this, which is of course even more undesirable. In order for this to be secure, this data must be encrypted. Secondly, the definition of the setup of gitlab for config as code. The configuration as code repositories should be housed in this gitlab group. This also allows the code to be kept simple, without too much administration. The repositories variable is an exception to the rule, which is not only about config as code, but also about the organization's 'own' affairs. Self-built collections and execution environments must also be available in the automation hub before you can fully restore the controller. Here we keep track of all those are. This is the only piece of administration that we will have to keep track of. The controllers variable is used to read in the correct configuration, after the base config has been restored, so that only the configured organizations will be restored. hosts.yaml:
all:
hosts:
localhost
When we talk about an 'empty' inventory, this is the most basic form. Since ansible requires an inventory, we'll give it to him. This is where the generic part of this repository ends and we are going to deal with the recovery parts.
Recovery code automation hub
Below is the playbook that is used to fully automate the recovery of the automation hub. We are going to functionally chop this playbook into pieces for the explanation, if you put this piece back together you have the full playbook. recover_ahub.yml:
---
- hosts: localhost
gather_facts: false
pre_tasks:
- name: Get vars
ansible.builtin.include_vars: env_vars.yml
- name: "Create GitLab Project in group {{ gitlab_group }}"
community.general.gitlab_project:
api_url: "{{ gitlab_protocol }}{{ gitlab_url }}"
validate_certs: false
api_username: "{{ gitlab_user }}"
api_password: "{{ gitlab_password }}"
name: "{{ ahub_project_name }}"
group: "{{ gitlab_group }}"
default_branch: "{{ gitlab_env_branch }}"
shared_runners_enabled: true
initialize_with_readme: true
state: present
- name: Clone the new gitlab repository
ansible.builtin.git:
repo: "{{ gitlab_protocol }}{{ gitlab_user }}:{{ gitlab_password }}@{{
gitlab_url }}/{{ gitlab_group }}/{{ ahub_project_name }}.git"
dest: "/tmp/{{ ahub_project_name }}"
version: "{{ gitlab_env_branch }}"
clone: true
update: true
environment:
GIT_SSL_NO_VERIFY: true
In the above part of the playbook, we create a git repository and clone it to a temporary directory. Almost all of the variables used in this process come from the env_vars.yml, the only variable that needs to be passed to the playbook is the environment that needs to be restored, in the variable 'gitlab_env_branch'.
tasks:
- name: Template the gitlab-ci.yml for all repositories to recover
ansible.builtin.template:
src: gitlab-ci-ahub.yml.j2
dest: "/tmp/{{ ahub_project_name }}/.gitlab-ci.yml"
mode: '0644'
- name: Push the updated GitLab repository
ansible.builtin.shell: |
git config --global user.name "{{ gitlab_user }}"
git config --global user.email "{{ gitlab_user }}@example.com"
git add --all
git commit -m 'initial config'
git -c http.sslVerify=false push origin "{{ gitlab_env_branch }}"
args:
chdir: "/tmp/{{ ahub_project_name }}"
changed_when: false
- name: Delete the tempory directory
ansible.builtin.file:
path: /tmp/{{ ahub_project_name }}
state: absent
- name: Sleep for 10 sec
ansible.builtin.pause:
seconds: 10
The moment the repository is cloned locally, files can be written into it, we are going
to do that too, but in this case, only 1 file. That file is the pipeline that will be started
when this repository is pushed back to git. Because we use a template that defines
our pipeline, we can trigger another repository in this pipeline, as if it were a
dependency of this repository. This will cause that pipeline to run for the specified
branch. Let that be the 'configuration as code' for the automation hub that we want to
restore... All repositories specified in the env_vars.yml will also be triggered, because
the template reads them in a loop.
After pushing the modified file(s) in the local version of the repository, the pipeline in
git will be triggered by the update and we won't need the local directory anymore, so
we'll delete it.
We'll wait 10 seconds to give gitlab a chance to create and launch the pipeline.
- name: GitLab Post | Obtain Access Token
ansible.builtin.uri:
url: "{{ gitlab_protocol }}{{ gitlab_url }}/oauth/token"
method: POST
validate_certs: false
body_format: json
headers:
Content-Type: application/json
body: >
{
"grant_type": "password",
"username": "{{ gitlab_user }}",
"password": "{{ gitlab_password }}"
}
register: gitlab_access_token
- name: Store the token in var
ansible.builtin.set_fact:
token: "{{ gitlab_access_token.json.access_token }}"
- name: Check the pipeline until it has run
ansible.builtin.uri:
url: "{{ gitlab_protocol }}{{ gitlab_url }}/api/v4/projects/{{ gitlab_group }}%2F{{ ahub_project_name }}/pipelines"
validate_certs: false
headers:
Authorization: "Bearer {{ token }}"
register: _jobs_list
until: _jobs_list.json[0].status == "success"
retries: 120
delay: 20
To make sure that everything is going according to plan and because we want to see the status in the controller, we follow the pipeline so that we can also return the status to the controller.
- name: "Delete GitLab Project in group {{ gitlab_group }}"
community.general.gitlab_project:
api_url: "{{ gitlab_protocol }}{{ gitlab_url }}"
validate_certs: true
api_username: "{{ gitlab_user }}"
api_password: "{{ gitlab_password }}"
name: "{{ ahub_project_name }}"
group: "{{ gitlab_group }}"
default_branch: "{{ gitlab_env_branch }}"
shared_runners_enabled: true
initialize_with_readme: true
state: absent
Finally, we make sure that the recovery project in gitlab is neatly deleted, so that we can also see that the playbook has ended neatly. If the project is still standing, an error has occurred somewhere and the pipeline logs of this project can be used to see which part did not come to a successful conclusion. The only thing we need for execution is a correct template to be able to write the pipeline in the repository.
gitlab-ci-ahub.yml.j2:
# Pull the ansible config as code image
# change this to suit you installation, for a runner on openshift or docker you
will need an image
# image: localhost:5000/ansible-image:1.0
# List of pipeline stages
stages:
{% for repo in repositories %}
- {{ repo | lower }}
- sleep_{{ repo | lower }}
{% endfor %}
{% for repo in repositories %}
{{ repo | lower }}:
stage: {{ repo |lower }}
trigger:
project: {{ repo | lower }}
branch: {{ gitlab_env_branch }}
strategy: depend
when: always
sleep_{{ repo | lower }}:
stage: sleep_{{ repo |lower }}
script:
- sleep 20
when: always
{% endfor %}
The template as shown above ensures that the specified projects in the repositories variable are called in order, which initiates and executes the pipeline for the said branch. There is a 20-second pause between each call to such a dependent project. This is not always necessary and perhaps environment dependent, in the environment where this was built and tested, it was indeed necessary to take a break between some projects. Since we don't do 'some', it's all.
Obtain API Token
Not all environments allow you to specify the token in advance which should be used
in the automation hub, so a manual step has to be inserted. If you could set it during
the installation, keep it the same as the token you specified on the gitlab group for
the config as code. Then this step is no longer necessary and you can continue
entirely on the machine...
The big manual step...
Log in to the newly configured automation hub and generate a new token and commit it to the gitlab group for the Configuration As Code
repositories. As a result, the new token will be included in all pipelines that are yet to be executed.
Then we can continue with the playbook below to restore the controller configuration.
Recover Code AAP Configuration
Broadly speaking, this playbook does with the controller for the automation hub has
done. However, with a slight change, this playbook doesn't need to tell you much,
just where he can find the repositories that meet the naming convention and the
basic configuration repository, the rest he figures out himself.
We're going to chop it up again and explain the 'magic' for each part, unfortunately
we lose the magic along the way, but hopefully it will be clear what we're doing
exactly. Knowing what it does is also being able to recover if things don't go so well.
Because the organizations that have been configured cannot be retrieved until the
base configuration is loaded into the controller, this playbook is split into two phases.
First the basic configuration will have to be loaded.
recover_aap.yml:
---
- hosts: localhost
gather_facts: false
tasks:
- name: Get vars
ansible.builtin.include_vars: env_vars.yml
# Reconfigure controller from base config
- name: "Create GitLab Project in group {{ gitlab_group }}"
community.general.gitlab_project:
api_url: "{{ gitlab_protocol }}{{ gitlab_url }}"
validate_certs: false
api_username: "{{ gitlab_user }}"
api_password: "{{ gitlab_password }}"
name: "{{ aap_project_name }}_base"
group: "{{ gitlab_group }}"
default_branch: "{{ gitlab_env_branch }}"
shared_runners_enabled: true
initialize_with_readme: true
state: present
- name: Clone the new gitlab repository
ansible.builtin.git:
repo: "{{ gitlab_protocol }}{{ gitlab_user }}:{{ gitlab_password }}@{{
gitlab_url }}/{{ gitlab_group }}/{{ aap_project_name }}_base.git"
dest: "/tmp/{{ aap_project_name }}_base"
version: "{{ gitlab_env_branch }}"
clone: true
update: true
environment:
GIT_SSL_NO_VERIFY: true
As with the recovery of the automation hub, we create a temporary repository and clone it to a temporary directory. This allows us to put the files in locally and then push them to gitlab.
- name: Template the gitlab-ci.yml for the base configuration
ansible.builtin.template:
src: gitlab-ci_base.yml.j2
dest: "/tmp/{{ aap_project_name }}_base/.gitlab-ci.yml"
mode: '0644'
- name: Push the updated GitLab repository
ansible.builtin.shell: |
git config --global user.name "{{ gitlab_user }}"
git config --global user.email "{{ gitlab_user }}@example.com"
git add --all
git commit -m 'initial config'
git -c http.sslVerify=false push origin "{{ gitlab_env_branch }}"
args:
chdir: "/tmp/{{ aap_project_name }}_base"
changed_when: false
- name: Delete the tempory directory
ansible.builtin.file:
path: /tmp/{{ aap_project_name }}_base
state: absent
- name: Sleep for 10 sec
ansible.builtin.pause:
seconds: 60
Above is the template for the pipeline that will trigger the base configuration as code project as a dependency. After pushing the new files in the gitlab repository, the pipeline will be created. After this, we can delete the temporary folder, so that we leave everything neat. We'll wait a little longer to give gitlab a chance to create and launch the pipeline.
- name: GitLab Post | Obtain Access Token
ansible.builtin.uri:
url: "{{ gitlab_protocol }}{{ gitlab_url }}oauth/token"
method: POST
validate_certs: false
body_format: json
headers:
Content-Type: application/json
body: >
{
"grant_type": "password",
"username": "{{ gitlab_user }}",
"password": "{{ gitlab_password }}"
}
register: gitlab_access_token
- name: Store the token in var
ansible.builtin.set_fact:
token: "{{ gitlab_access_token.json.access_token }}"
- name: Check the pipeline until it has run
ansible.builtin.uri:
url: "{{ gitlab_protocol }}{{ gitlab_url }}api/v4/projects/{{ gitlab_group }}%2F{{ aap_project_name }}_base/pipelines"
validate_certs: false
headers:
Authorization: "Bearer {{ token }}"
register: _jobs_list
until: _jobs_list.json[0].status == "success"
retries: 120
delay: 20
To make sure that the base configuration is loaded correctly before proceeding, we will monitor the pipeline. If it returns success, we can move on to the part to load the organizations.
- name: "Delete GitLab Project in group {{ gitlab_group }}"
community.general.gitlab_project:
api_url: "{{ gitlab_protocol }}{{ gitlab_url }}"
validate_certs: true
api_username: "{{ gitlab_user }}"
api_password: "{{ gitlab_password }}"
name: "{{ aap_project_name }}_base"
group: "{{ gitlab_group }}"
default_branch: "{{ gitlab_env_branch }}"
shared_runners_enabled: true
initialize_with_readme: true
state: absent
We delete the temporary project after it has run successfully, if it lags, this part has not been completed correctly and the second part will not have started either. In the case of the automation hub, the play was done here, but we still have to read in all the configurations of the created organizations, which we will do fully automatically from now on.
# Recover all configured orgs
# Read them from controller
- name: "Create GitLab Project in group {{ gitlab_group }}"
community.general.gitlab_project:
api_url: "{{ gitlab_protocol }}{{ gitlab_url }}"
validate_certs: false
api_username: "{{ gitlab_user }}"
api_password: "{{ gitlab_password }}"
name: "{{ aap_project_name }}"
group: "{{ gitlab_group }}"
default_branch: "{{ gitlab_env_branch }}"
shared_runners_enabled: true
initialize_with_readme: true
state: present
- name: Clone the new gitlab repository
ansible.builtin.git:
repo: "https://{{ gitlab_user }}:{{ gitlab_password }}@{{ gitlab_url }}/{{ gitlab_group }}/{{ aap_project_name }}.git"
dest: "/tmp/{{ aap_project_name }}"
version: "{{ gitlab_env_branch }}"
clone: true
update: true
environment:
GIT_SSL_NO_VERIFY: true
We recreate the temporary repository and clone it to a temporary directory. This allows us to put the files in locally and then push them to gitlab.
- name: Controller | Read the organizations list
ansible.builtin.uri:
url: "https://{{ controllers[gitlab_env_branch]['host'] }}/api/v2/organizations/"
user: admin
password: "{{ controllers[gitlab_env_branch]['passwd'] }}"
method: GET
body_format: json
force_basic_auth: true
validate_certs: false
register: _controller_organizations
- name: Get the list of dicts
ansible.builtin.set_fact:
organizations: "{{ _controller_organizations.json.results }}"
We log into the controller and see which organizations are listed in it and store the list in a variable organizations. With the list of the configured organizations we are going to perform the following task:
- name: Template the gitlab-ci.yml for all organizations
ansible.builtin.template:
src: gitlab-ci.yml_org.yml.j2
dest: "/tmp/{{ aap_project_name }}/.gitlab-ci.yml"
mode: '0644'
- name: Push the updated GitLab repository
ansible.builtin.shell: |
git config --global user.name "{{ gitlab_user }}"
git config --global user.email "{{ gitlab_user }}@example.com"
git add --all
git commit -m 'initial config'
git -c http.sslVerify=false push origin "{{ gitlab_env_branch }}"
args:
chdir: "/tmp/{{ aap_project_name }}"
changed_when: false
With the list and the template, we create a new pipeline file in the repository and push it to gitlab. This will be used to build the new pipeline.
- name: Delete the tempory directory
ansible.builtin.file:
path: /tmp/{{ aap_project_name }}
state: absent
- name: Sleep for 10 sec
ansible.builtin.pause:
seconds: 10
We'll remove the temporary clone from the repository locally and wait a little longer before continuing, to give gitlab time to initialize and launch the pipeline.
- name: GitLab Post | Obtain Access Token
ansible.builtin.uri:
url: "{{ gitlab_protocol }}{{ gitlab_url }}oauth/token"
method: POST
validate_certs: false
body_format: json
headers:
Content-Type: application/json
body: >
{
"grant_type": "password",
"username": "{{ gitlab_user }}",
"password": "{{ gitlab_password }}"
}
register: gitlab_access_token
- name: Store the token in var
ansible.builtin.set_fact:
token: "{{ gitlab_access_token.json.access_token }}"
- name: Check the pipeline until it has run
ansible.builtin.uri:
url: "{{ gitlab_protocol }}{{ gitlab_url }}api/v4/projects/{{ gitlab_group }}%2F{{ aap_project_name }}/pipelines"
validate_certs: false
headers:
Authorization: "Bearer {{ token }}"
register: _jobs_list
until: _jobs_list.json[0].status == "success"
retries: 120
delay: 20
We regularly wait and check the pipeline to see if it is already in the correct state, if it returns success, everything has gone well and the configuration will have been reset for all organizations.
- name: "Delete GitLab Project in group {{ gitlab_group }}"
community.general.gitlab_project:
api_url: "{{ gitlab_protocol }}{{ gitlab_url }}"
validate_certs: true
api_username: "{{ gitlab_user }}"
api_password: "{{ gitlab_password }}"
name: "{{ aap_project_name }}"
group: "{{ gitlab_group }}"
default_branch: "{{ gitlab_env_branch }}"
shared_runners_enabled: true
initialize_with_readme: true
state: absent
Again, we delete the temporary repository if everything went well. This is how we leave everything clean behind us.