Napalm offers an easy way to configure and gather information from network devices using a unified API. No matter what vendor it is used against the input task and returned output will be the same. The only thing that will not be vendor neutral is the actual commands run and configuration being applied. This post documents experiences of trying to replace the whole configuration on NXOS using Napalm with Ansible.
Napalm operates using the nxapi and scp-server features so these must be enabled for Napalm to work. I have been using N9Kv 9.2.4
on EVE-NG and find the API to be slow at times and buggy in that it can intermittently break when using config_replace. Even though the service was still running and NXOS said the port was open you couldn’t telnet on 443. Removing the command nxapi use-vrf management
improved stability, as the VRF is optional it doesn’t effect pushing config via the management interface.
feature nxapiEnable the NXOS API feature scp-serverAllows for the copying of files used for diff and config_replace show nxapiSee the certificate, port number and timeout settings show nxapi-server logsShow logs of past API connections
The main options when using Napalm within Ansible are pretty much the same as when it is used natively with Python. The rollback feature hasn’t been ported into Ansible however it does still create rollback_config.txt meaning rollback can be done by applying this configuration file.
- name:NET >> Apply the configuration napalm_install_config:Used to deploy the final configuration provider: '{{creds }}'The provider, a dictionary of auth creds set in group_var dev_os: '{{os }}'The network device type (os) set in host_var timeout:60 Default timeout to wait for a response is 60 seconds config_file: '{{host_tmpdir }}/assembled.conf'File containing the configuration that is to be applied to this device commit_changes: trueor falseTrue will apply the changes, false will discard the changes after doing a diff replace_config: trueor false(Optional) True to replace entire config or False to just merged with it (default is False) get_diffs: trueor false(Optional) True compares diffs of current and new config, use -v or register to print (default is True)* diff_file: '{{host_tmpdir }}/diff'(Optional) Writes the results of diff to a file called diff (also need get_diffs to be enabled) register:changes - debug: var=changes .msg.splitlines()Prints the differences to CLI
commit_changes: False
means it wont commit changes, but even if this was ‘True’ is overridden by Ansible check-mode.
get_diff
is enabled by default, use -v or register and debug to view diffs. The output will be just one long string so use splitlines with debug to make it more human readable.
register: changes
- debug: var=changes.msg.splitlines()
A better option for complex or long changes is to save the diff to file diff_file:
(still need to have get_diffs enabled)
diff_file: "{{ ans.dir_path }}/{{ inventory_hostname }}/diff.txt"
Replace
Nexus configuration files are checkpoint files which use the rollback feature to create archives (checkpoints) and rollback between checkpoint file versions (without needing a reboot). Napalm makes use of this feature to to perform replace_config.
nxapi and scp-server must be including in any config files pushed to NXOS as these features are used to connect and copy the configuration files over to the device. It will do API calls to copy files over, get the diff and apply the config. By default Napalm timeout expects a response to each API call in 60 seconds, when applying large config files it is likely this will need to be increased.
- name: "CFG >> Applying changes using replace config"
napalm_install_config:
provider: "{{ ans.creds_all }}"
dev_os: "{{ ansible_network_os }}"
timeout: 240
config_file: "{{ ans.dir_path }}/{{ inventory_hostname }}/config/config.cfg"
commit_changes: True
replace_config: True
diff_file: "{{ ans.dir_path }}/diff/{{ inventory_hostname }}.txt"
get_diffs: True
register: changes
- debug: var=changes.msg.splitlines()
tags: [diff]
The command order in the desired state config file does not have to match the current configuration, NXOS is smart enough to compare commands. The only gotcha is if a line of config relies on another part of the configuration, such as creation VLAN before applying to an interface.
These two lines MUST be at the start of the configuration file or the deployment will fail.
!Command: Checkpoint cmd vdc 1*NXOS wont recognize candidate_config.txt as a checkpoint file without it version 9.2(4) Bios:version*Without it NXOS does 'no hostname' which causes failure due to 'Syntax error while parsing 'vdc DC1-N9K-LEAF01 id 1'
If !Command: Checkpoint cmd vdc 1
is missing the deployment will fail with a 500 response code. The error on the device will be:
ERROR: Rollback patch computation failed due to the following reason(s)
The checkpoint file was not created using checkpoint CLI
When Napalm is run to replace the configuration (replace_config: true) the following commands are applied on the device:
- scp -t bootflash:/candidate_config.txt
- delete bootflash:/sot_file
- checkpoint file bootflash:/sot_file
- checkpoint file bootflash:/rollback_config.txt
- rollback running-config file bootflash:/candidate_config.txt
- copy running-config startup-config
When using the replace method you need to get used to failure, is going to happen lot in the early stages and when adding new features. A lot of the issues arise from the hidden default configuration, is helpful to use show run all
and/or show file sot_file
to workout what the full configuration should look like. The two most common failure scenarios you will come across are:
- Something stopped the code being deployed on the NXOS and reverted
fatal: [DC1-N9K-SPINE01]: FAILED! => {"changed": false, "msg": "cannot install config: Invalid status code returned on NX-API POST\ncommands: ['terminal dont-ask', 'rollback running-config file candidate_config.txt', 'no terminal dont-ask']\nstatus_code: 500"}
- Lost access to the device (commands issued broke your access) or it took longer than 60 seconds to apply (likely config was still applied)
fatal: [DC1-N9K-SPINE02]: FAILED! => {"changed": false, "msg": "cannot install config: HTTPSConnectionPool(host='10.10.108.12', port=443): Read timed out. (read timeout=60)"}
Can use show rollback status
to see how long the API call actually took to apply and adjust the Napalm timeout accordingly.
DC1-N9K-LEAF01# show rollback status
Last operation : Rollback to file
Details:
Rollback type: atomic candidate_config.txt
Start Time: Sun Sep 13 09:35:27 2021
End Time: Sun Sep 13 09:38:07 2021
Operation Status: Success
For failures the best bet is to log into the NXOS and see if can work it out from the logs and files. The sot_file, candidate_config.txt and rollback_config.txt files are created whenever Napalm runs.
show filesot_file Device configuration, equivalent of show run all show filecandidate_config.txt Configuration transfer by Napalm that was to be applied show filerollback_config.txt Rollback config created before applying change (is same as sot_file) show diff rollback-patch filesot_file filecandidate_config.txt Check difference between device config and config file rollback running-config filerollback_config.txt To rollback the configuration rollback running-config filecandidate_config.txt verboseManually do the replace_config, verbose shows the cmds entered live
Some useful commands to see what happened when an attempt at deploying configuration was made.
show accounting logSee all the commands run on NXOS show rollback statusDetails on whether last install was a success or fail and the time it took show rollback log execLine-by-line the commands applied and possibly the command that made it fail show rollback log verifyResult of verification actual config is what was declared in applied config
Troubleshooting deployment failures
The first step is to use a combination of show rollback log verify
and show rollback log exec
to see if the reason for failure is obvious.
show rollback log verify will show what configuration was missing before the change was rolled back. The output from this command is not always clear, especially when applying lots of configuration. The below output shows that the command boot nxos bootflash:/nxos.9.2.4.bin
was applied (in running config) but the expected command (in the applied config file) was boot nxos bootflash:/nxos.9.2.4.bin sup-1
, so running config was missing sup-1
. This was typo by me as later NXOS versions require this and I had forgotten to take it out of my templates.
DC1-N9K-LEAF01# show rollback log verify
Operation : Rollback to Checkpoint File
Checkpoint file name : /candidate_config.txt
Scheme : bootflash
Rollback done By : admin
Rollback mode : atomic
Verbose : disabled
Start Time : Fri, 21:49:28 18 Sep 2020
Start Time UTC : Fri, 21:49:28 18 Sep 2020
End Time : Fri, 21:57:19 18 Sep 2020
End Time UTC : Fri, 21:57:19 18 Sep 2020
Status : Failed
Verification patch contains the following commands:
---------------------------------------------------
!!
Configuration To Be Removed Present in Running-config
=====================================================
!
boot nxos bootflash:/nxos.9.2.4.bin
Configuration To Be Added Missing in Running-config
===================================================
!
boot nxos bootflash:/nxos.9.2.4.bin sup-1
In this situation show rollback log exec
wouldn’t give any indication of what the problem was as the cmd wont cause the CLI raise an error.
switch# show rollback log exec
Operation : Rollback to Checkpoint File
Checkpoint file name : /candidate_config.txt
Scheme : bootflash
Rollback done By : admin
Rollback mode : atomic
Verbose : disabled
Start Time : Sat, 15:12:46 19 Sep 2020
Start Time UTC : Sat, 15:12:46 19 Sep 2020
End Time : Sat, 15:15:28 19 Sep 2020
End Time UTC : Sat, 15:15:28 19 Sep 2020
Rollback Status : Failed
Restoring Previous Config : Success
Executing Patch:
----------------
`config t `
`interface Ethernet1/128`
`shutdown`
`exit`
.....
`boot nxos bootflash:/nxos.9.2.4.bin sup-1`
Performing image verification and compatibility check, please wait....
`interface Ethernet1/5`
`no shutdown`
`interface Ethernet1/6`
`no shutdown`
`exit`
If the image didn’t exist then this would be shown in show rollback log exec
as that does cause the CLI raise an error.
`crypto key param rsa label DC1-N9K-LEAF01.stesworld.com modulus 2048`
`boot nxos bootflash:/nxos.9.2.3.bin sup-1`
Image provided does not exist.
Failed to set the boot variable: image not found (0x40450008)
Retrying Rollback Patch:
----------------
`config t `
`interface Ethernet1/6`
`no switchport trunk allowed vlan`
If you cant find the source fo the problem from the either of these commands I find the best thing to do is download the base config file locally and manually compare that against what you are trying to deploy. The problem is normally some hidden commands that you have forgotten about.
from napalm import get_network_driver
driver = get_network_driver('nxos')
device = driver('10.10.108.21','admin','ansible')
device.open()
with open("base_config.txt", mode='w') as x:
x.write(device._get_checkpoint_file())
Some common issues I have come across so far:
- Trunk ports have to use
switchport trunk allowed vlan 1-4094
instead ofswitchport trunk allowed vlan all
!#switchport trunk allowed vlan 1-4094
is required even if the interface isswitchport mode access
- Make sure all used interfaces have ‘!#no shutdown’, is a hidden command so wont see in show run (NXOS hides by using using !#)
- Referencing source-interfaces in the configuration before they have been created. Put source-interfaces nearer end of config file
- For port-channels only the port-channels (not physical interface) has the commands
switchport mode
command. If it is a trunkswitchport trunk allowed vlan x
is also only on the port-channel - The physical interface for port-channels has to be in the is format
channel-group 27 force mode active
with keywordforce
- Is no need to have the
vlan 1, 10, 20, 30
command for all vlans created, they are all entered vertically with the name under the VLAN
Dealing with interfaces
If the interface is an access ports (including port-channels) it always needs this command or the deployment will fail. The only exception to this are interfaces that are Layer3 ports.
!#switchport trunk allowed vlan 1-4094
For example an dual-homed access port would look like this, notice how switchport mode access
is only on the port-channel
interface Ethernet1/13
description ACCESS >DC1-SRV-APP01 eth1
spanning-tree port type network
!#switchport
switchport access vlan 10
channel-group 13 force mode active
no shutdown
interface Port-channel13
description ACCESS >DC1-SRV-APP01 eth1
spanning-tree port type network
!#switchport
!#switchport trunk allowed vlan 1-4094
switchport access vlan 10
switchport mode access
no shutdown
Dual-homed trunk ports can’t have switchport mode
or allowed vlans
under the ethernet interface or the deployment will fail with this error:
Retrying Rollback Patch:
----------------
`config t `
`interface Ethernet1/14`
`switchport trunk allowed vlan 110, 120`
Syntax error while parsing 'switchport trunk allowed vlan 110, 120'
interface Ethernet1/14
description UPLINK > DC1-VIOS-SW1
spanning-tree port type network
!#switchport
channel-group 17 force mode active
no shutdown
interface Port-channel17
description UPLINK > DC1-VIOS-SW1
spanning-tree port type network
!#switchport
switchport trunk allowed vlan 110,120
switchport mode trunk
vpc 17
no shutdown
Merge
Napalm implements merges by simply applying the configuration line by line, it doesn’t use the checkpoint rollback functionality. This means that the changes made are not atomic, to delete something you have to specifically define the configuration to do so.
- name: "CFG >> Merging changes with current config"
napalm_install_config:
provider: "{{ ans.creds_all }}"
dev_os: "{{ ansible_network_os }}"
timeout: 60
config_file: "{{ ans.dir_path }}/{{ inventory_hostname }}/config/config.cfg"
commit_changes: True
diff_file: "{{ ans.dir_path }}/diff/{{ inventory_hostname }}.txt"
get_diffs: True
register: changes
- debug: var=changes.msg.splitlines()
tags: [diff]
The commands that it runs are as follows
- checkpoint file bootflash:/rollback_config.txt
- Applies merge config line by line.
- copy running-config startup-config
Diffs for merges are simply the lines in the merge candidate config. It is not going to show you any differences unless you are specifically deleting (using no) something from the config.