Automate Leaf and Spine Deployment - Part4

deploying the fabric with ansible

23 February 2021   9 min read

The 4th post in the ‘Automate Leaf and Spine Deployment’ series goes through the creation of the base and fabric config snippets and their deployment to devices. Loopbacks, NVE and intra-fabric interfaces are configured and both the underlay and overlay routing protocol peerings formed leaving the fabric in a state ready for services to be added.


The following sections start with the Ansible host and N9K pre-configurations and go right through to the deployment of the fabric using Ansible.


Table Of Contents



Prerequisites

The deployment has been tested on NXOS 9.2(4) and NXOS 9.3(5) (in theory should be fine with 9.3(6) & 9.3(7)) using Ansible 2.10.6 and Python 3.6.9. There are a few nuances when running the different versions of code, see the caveats section in Part1 for more details.

git clone https://github.com/sjhloco/build_fabric.git
mkdir ~/venv/venv_ansible2.10
python3 -m venv ~/venv/venv_ansible2.10
source ~/venv/venv_ansible2.10/bin/activate
pip install -r build_fabric/requirements.txt

Once the environment has been setup with all the packages installed run napalm-ansible to get the location of the napalm-ansible paths and add them to ansible.cfg under [defaults].

Before any configuration can be deployed using Ansible a few things need to be manually configured on all N9K devices:

  • Management IP address and default route
  • The features nxapi and scp-server are required for Naplam replace_config
  • Image validation can take a while on NXOS so is best to be done so beforehand
interface mgmt0
  ip address 10.10.108.11/24
vrf context management
  ip route 0.0.0.0/0 10.10.108.1
feature nxapi
feature scp-server
boot nxos bootflash:/nxos.9.3.5.bin sup-1

  • Leaf and border switches also need the TCAM allocation changed to allow for arp-supression. This can differ dependant on device model, any changes made need correcting in /roles/base/templates/nxos/bse_tmpl.j2 to keep it idempotent
hardware access-list tcam region racl 512
hardware access-list tcam region arp-ether 256 double-wide
copy run start
reload

The default username/password for all devices is admin/ansible and is stored in the variable bse.users.password. Swap this out for the encrypted type5 password got from the running config. The username and password used by Napalm to connect to devices is stored in ans.creds_all and will also need changing to match (is plain-text or use vault).

Before the playbook can be run the devices SSH keys need adding on the Ansible host. ssh_key_playbook.yml (in ssh_keys directory) can be run to add these automatically, you just need to populate the device’s management IPs in the ssh_hosts file.

sudo apt install ssh-keyscan
ansible-playbook ssh_keys/ssh_key_add.yml -i ssh_keys/ssh_hosts

Base and Fabric role

Both roles are setup in a similar manner using the variables defined in base.yml, fabric.yml and host_vars to render jinja templates creating the config snippets. Tags are defined under the task rather than the role so apply to all tasks within the role.
There are no filter plugins for these roles so they do have a little bit of programmability in the templates. This is kept to the bare minimum and is only for differences in configuration between spine and leaf/border and the optional settings in bse.services.

  vars_files:
    - vars/ansible.yml
    - vars/base.yml

  tasks:
    - name: Builds the base config snippet
      import_role:
        name: base
      tags: [bse, bse_fbc, bse_fbc_tnt, bse_fbc_tnt_intf, full]
    - name: Builds the fabric config snippet
      import_role:
        name: fabric
      tags: [fbc, bse_fbc, bse_fbc_tnt, bse_fbc_tnt_intf, full]

The role tasks are pretty simplistic, they generate the config snippets from the role template and save it to file. The configuration is saved in a device specific folder within ~/device_configs/device_name/config, the parent directory location can be changed with ans.dir_path.
changed_when stops ansible reporting changes when the template is rendered and check_mode allows the configuration to still be written to file when the playbook is run in check-mode.

- name: "FBC >> Generating fabric config snippets"
  template:
    src: "{{ ansible_network_os }}/fbc_tmpl.j2"
    dest: "{{ ans.dir_path }}/{{ inventory_hostname }}/config/fabric.conf"
  changed_when: False
  check_mode: False

Interface cleanup role (intf_cleanup)

To keep the playbook truly declarative any interfaces that are not used need to be reset to the default settings. For example, if the interfaces used for the MLAG were changed without interface cleanup the old interfaces would not be wiped breaking the idempotency.

Interfaces used by the fabric can be defined in the following locations:

  • Fabric interfaces: Defined under fbc.adv.bse_intf and turned into host_vars by the inventory_plugin
  • MLAG peer-link: Defined under fbc.adv.bse_intf.mlag_peer and turned into host_vars by the inventory_plugin
  • MLAG keepalive: Defined under fbc.adv.bse_intf.malg_kalive and turned into host_vars by the inventory_plugin
  • End host interfaces: Defined under svc.adv.single_homed and svc.adv.dual_homed and manipulated by the svc_intf_dm method within the format_dm.py filter_plugin

The first task in the intf_cleanup role passes three arguments into the get_intf.py filter_plugin :

  • hostvars[inventory_hostname]: Device host_vars which has the total number of physical interfaces and the used fabric interfaces
  • fbc.adv.bse_intf: Interface naming format (intf_fmt) for filtering and configuration (for example Ethernet1/)
  • flt_svc_intf: End host interfaces (service_interface.yml) using the method flt_svc_intf within format_dm.py filter_plugin from the services role
- name: "Getting interface list"
  block:
  - name: "INTF_CLN >> Getting list of unused interfaces"
    set_fact:
      flt_dflt_intf: "{{ hostvars[inventory_hostname] |get_intf(fbc.adv.bse_intf, flt_svc_intf |default(None)) }}"

These arguments are used to create a list of used and available interfaces which are converted into sets and the symmetric_difference (non-duplicates, so non-used interfaces) returned to Ansible and stored in the flt_dflt_intf Ansible fact. This fact is used by the roles second task to render the dflt_intf_tmpl.j2 template and generate a config snippet of all the unused interfaces.

  - name: "INTF_CLN >> Generating default interface config snippet"
    template:
      src: "{{ ansible_network_os }}/dflt_intf_tmpl.j2"
      dest: "{{ ans.dir_path }}/{{ inventory_hostname }}/config/dflt_intf.conf"
    changed_when: False
    check_mode: False

The template renders the default interface configuration as got from show run all, it must match exactly including the hashed out lines.

{% for intf in flt_dflt_intf%}
interface {{ intf }}
  !#shutdown
  !#switchport
  switchport mode access
  !#switchport trunk allowed vlan 1-4094
{% endfor %}

The intf_cleanup role is automatically run (using tags) whenever either the fabric or service_interface roles are run.

Assembling config snippets

The config snippets are saved within device specific directories ~/device_configs/device_name/config with an extension of .conf. This directory is deleted and recreated at every playbook run. The parent directory location can be changed using ans.dir_path.

Ansible assemble takes all files within the config directory that have an extension of .conf and creates a unified configuration file (config.cfg). The order of the configuration in the file does not matter, NXOS is smart enough to workout what is needed. The only gotcha is the order of operation as would be the same with manual configuration, for example the creation of a VLAN must be before the assignment of it to an interface.

    - name: "SYS >> Joining config snippets into one file"
      assemble:
        src: "{{ ans.dir_path }}/{{ inventory_hostname }}/config"
        dest: "{{ ans.dir_path }}/{{ inventory_hostname }}/config/config.cfg"
        regexp: '\.conf$'
      changed_when: False
      check_mode: False
      tags: [bse_fbc, bse_fbc_tnt, bse_fbc_tnt_intf, full, merge]

Napalm

Napalm replace_config is used to replace the devices current configuration with the configuration from the config.cfg. It is stateless, it doesn’t care what is already configured just what the end result will be. The device is clever enough to do the difference and ONLY apply the changes needed. Unless the change is disruptive to a feature (for example changing BGP ASN) there will be no downtime.

  • If something isn’t relevant anymore it is cleaned (wiped form the device)
  • Only makes change for the differences, it won’t change any of the existing config if already in place

Napalm uses SCP to copy over candidate_config.txt and checkpoint to create sot_file and rollback_config.txt. Use show file xx to view these.

show file sot_file                                                     Device config, equivalent of show run all
show file candidate_config.txt Config transferred by Napalm that is to applied
show file rollback_config.txt Rollback config is same as SOT
show diff rollback-patch file sot_file file candidate_config.txt Check diff between device config and declared config

API calls are used to copy files over, get the diff and apply the configuration. By default Napalm expects a response to each API call in 60 seconds, this has been increased to 360 seconds as it can take upto 6 minutes to deploy full configuration (with service roles). If it takes longer (N9Kv running 9.2(4) is very slow) Ansible will report the build as failed but it is likely the process is still running on the device so give it a minute and run the playbook again, it should pass and with no changes needed.

The applied configuration is automatically saved to ~/device_configs/diff/device_name.txt and optionally printed to screen.
Napalm commit_changes is set to True as Ansible check-mode is used to do a dry run. Check-mode will show you what changes would be made by the playbook if committed, it does everything except actually applying the configuration.

    - name: "CFG >> Applying changes using replace config"
      napalm_install_config:
        provider: "{{ ans.creds_all }}"
        dev_os: "{{ ansible_network_os }}"
        timeout: 360
        config_file: "{{ ans.dir_path }}/{{ inventory_hostname }}/config/config.cfg"
        commit_changes: True
        replace_config: True
        diff_file: "{{ ans.dir_path }}/diff/{{ inventory_hostname }}.txt"
        get_diffs: True
      register: changes
      tags: [bse_fbc, bse_fbc_tnt, bse_fbc_tnt_intf, full]

    - debug: var=changes.msg.splitlines()
      tags: [diff]

Ansible Napalm does not have a dedicated method for rolling back changes so requires a separate task to do so by applying rollback_config.txt.

    - name: "NET >> Rolling back configuration"
      block:
      - net_get:
          src: rollback_config.txt
          dest :  "{{ ans.dir_path }}/{{ inventory_hostname }}/config/rollback_config.txt"
        check_mode: False
        connection: network_cli
      - napalm_install_config:
          provider: "{{ ans.creds_all }}"
          dev_os: "{{ ansible_network_os }}"
          timeout: 360
          config_file: "{{ ans.dir_path }}/{{ inventory_hostname }}/config/rollback_config.txt"
          commit_changes: True
          replace_config: True
          diff_file: "{{ ans.dir_path }}/diff/{{ inventory_hostname }}_rollback.txt"
          get_diffs: True
        register: changes
      tags: [rb]

This other Configure NXOS with Napalm post goes into more detail deploying with Napalm and som of the issues you are likely to come across.

Running playbook

Tags are used to allow for only certain roles or combination of roles to be run. The table is a list of tags that are useful up to this point, there are other tags related to the services roles which are discussed in more detail in Automate Leaf and Spine Deployment Part5 - fabric services: tenant, interface, route.

The base and fabric roles are intrinsically linked so when deploying the only option is to deploy them both (and intf_cleanup).

Ansible tag Playbook action
bse Generates the base configuration snippet saved to device_name/config/base.conf
fbc Generates the fabric and intf_cleanup configuration snippets saved to fabric.conf and dflt_intf.conf
bse_fbc Generates, joins and applies the base, fabric and inft_cleanup config snippets
rb Reverses the last applied change by deploying the rollback configuration (rollback_config.txt)
diff Prints the differences between the current_config (on the device) and desired_config (applied by Napalm) to screen

  • bse and fbc will only generate the config snippet and save it to file. No connections are made to devices or changes applied
  • diff tag can be used with bse_fbc or rb to print the configuration changes to screen
  • Changes are always saved to file no matter whether diff is used or not
  • -C or --check-mode will do everything except actually apply the configuration

Generate the base config: Creates the base config snippets and saves it to base.conf

ansible-playbook PB_build_fabric.yml -i inv_from_vars_cfg.yml --tag bse

Generate the fabric config: Creates the fabric and interface cleanup config snippets and saves them to fabric.conf and dflt_intf.conf

ansible-playbook PB_build_fabric.yml -i inv_from_vars_cfg.yml --tag fbc

Generate the complete config: Creates the config snippets, assembles them in config.cfg, compares against device config and prints the diff

ansible-playbook PB_build_fabric.yml -i inv_from_vars_cfg.yml --tag 'bse_fbc, diff' -C

Apply the config: Replaces current config on the device with changes made automatically saved to ~/device_configs/diff/device_name.txt

ansible-playbook PB_build_fabric.yml -i inv_from_vars_cfg.yml --tag bse_fbc