ESXi 升级 Mellanox 网卡固件

    NVMe over RoCE 对RoCE网卡的固件兼容性有较高要求

    • 打开ESXi的SSH并进入维护模式,远程登录ESXi主机
    • 用 esxcfg-nics -l 查看网卡列表,用 esxcli network nic get -n 查看网卡驱动和固件版本
    [root@yaoge123:~] esxcfg-nics -l
    Name    PCI          Driver      Link Speed      Duplex MAC Address       MTU    Description                   
    vmnic0  0000:19:00.0 igbn        Up   1000Mbps   Full   **:**:**:**:**:** 1500   Intel Corporation I350 Gigabit Network Connection
    vmnic1  0000:19:00.1 igbn        Up   1000Mbps   Full   **:**:**:**:**:** 1500   Intel Corporation I350 Gigabit Network Connection
    vmnic2  0000:8a:00.0 nmlx5_core  Up   25000Mbps  Full   **:**:**:**:**:** 1500   Mellanox Technologies ConnectX-5 EN NIC; 10/25GbE; dual-port SFP28; PCIe3.0 x8; (MCX512A-ACU)
    vmnic3  0000:8a:00.1 nmlx5_core  Up   25000Mbps  Full   **:**:**:**:**:** 1500   Mellanox Technologies ConnectX-5 EN NIC; 10/25GbE; dual-port SFP28; PCIe3.0 x8; (MCX512A-ACU)
    vmnic4  0000:8b:00.0 nmlx5_core  Up   25000Mbps  Full   **:**:**:**:**:** 1500   Mellanox Technologies ConnectX-5 EN NIC; 10/25GbE; dual-port SFP28; PCIe3.0 x8; (MCX512A-ACU)
    vmnic5  0000:8b:00.1 nmlx5_core  Up   25000Mbps  Full   **:**:**:**:**:** 1500   Mellanox Technologies ConnectX-5 EN NIC; 10/25GbE; dual-port SFP28; PCIe3.0 x8; (MCX512A-ACU)
    [root@yaoge123:~] esxcli network nic get -n vmnic2
       Advertised Auto Negotiation: true
       Advertised Link Modes: Auto, 1000BaseCX-SGMII/Full, 10000BaseKR/Full, 25000BaseTwinax/Full
       Auto Negotiation: true
       Backing DPUId: N/A
       Cable Type: FIBRE
       Current Message Level: -1
       Driver Info: 
             Bus Info: 0000:8a:00:0
             Driver: nmlx5_core
             Firmware Version: 16.32.1010
             Version: 4.23.0.36
       Link Detected: true
       Link Status: Up 
       Name: vmnic2
       PHYAddress: 0
       Pause Autonegotiate: false
       Pause RX: true
       Pause TX: true
       Supported Ports: FIBRE, DA
       Supports Auto Negotiation: true
       Supports Pause: true
       Supports Wakeon: false
       Transceiver: internal
       Virtual Address: **:**:**:**:**:**
       Wakeon: None
    
    • VMware兼容性 搜索该网卡型号查询对应的驱动和固件版本
    • 下载 MFT,选择和ESXi版本匹配的安装包
    • 下载的Mellanox-MFT-Tools_*-package.zip解压,将解压出的.zip文件传至ESXi的/tmp,确认上传的.zip文件内根目录就有index.xml
    • 下载的Mellanox-NATIVE-NMST_*-package.zip解压,将解压出的.zip文件传至ESXi的/tmp,确认上传的.zip文件内根目录就有index.xml
    • 用 esxcli software component apply -d 安装上传的两个.zip,注意要写全路径
    [root@yaoge123:/tmp] esxcli software component apply -d /tmp/Mellanox-MFT-Tools_4.26.1.101-1OEM.801.0.0.21495797_22944840.zip 
    Installation Result
       Message: Operation finished successfully.
       Components Installed: Mellanox-MFT-Tools_4.26.1.101-1OEM.801.0.0.21495797
       Components Removed: 
       Components Skipped: 
       Reboot Required: false
       DPU Results: 
    [root@yaoge123:/tmp] esxcli software component apply -d /tmp/Mellanox-NATIVE-NMST_4.26.1.101-1OEM.801.0.0.21495797_22944879.zip 
    Installation Result
       Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.
       Components Installed: Mellanox-NATIVE-NMST_4.26.1.101-1OEM.801.0.0.21495797
       Components Removed: 
       Components Skipped: 
       Reboot Required: true
       DPU Results: 
    
    • 重启ESXi再打开SSH
    • 下载 Firmware ,查找对应网卡的兼容版本固件下载
    • 下载的fw-*.bin.zip解压,将解压出的.bin文件上传至ESXi的/tmp
    • 在/tmp下执行/opt/mellanox/bin/mlxfwmanager,会自动发现新固件文件,然后添加-u升级
    [root@yaoge123:/tmp] /opt/mellanox/bin/mlxfwmanager
    Querying Mellanox devices firmware ...
    
    Device #1:
    ----------
    
      Device Type:      ConnectX5
      Part Number:      MCX512A-ACU_Ax_Bx
      Description:      ConnectX-5 EN network interface card; 10/25GbE dual-port SFP28; PCIe3.0 x8; UEFI Enabled (x86/ARM)
      PSID:             MT_0000000425
      PCI Device Name:  mt4119_pciconf0
      Base GUID:        ***
      Base MAC:         ***
      Versions:         Current        Available     
         FW             16.32.1010     16.34.1002    
         PXE            3.6.0502       3.6.0700      
         UEFI           14.25.0017     14.27.0014    
    
      Status:           Update required
    
    Device #2:
    ----------
    
      Device Type:      ConnectX5
      Part Number:      MCX512A-ACU_Ax_Bx
      Description:      ConnectX-5 EN network interface card; 10/25GbE dual-port SFP28; PCIe3.0 x8; UEFI Enabled (x86/ARM)
      PSID:             MT_0000000425
      PCI Device Name:  mt4119_pciconf1
      Base GUID:        ***
      Base MAC:         ***
      Versions:         Current        Available     
         FW             16.32.1010     16.34.1002    
         PXE            3.6.0502       3.6.0700      
         UEFI           14.25.0017     14.27.0014    
    
      Status:           Update required
    
    ---------
    Found 2 device(s) requiring firmware update. Please use -u flag to perform the update.
    
    [root@yaoge123:/tmp] /opt/mellanox/bin/mlxfwmanager -u
    Querying Mellanox devices firmware ...
    
    Device #1:
    ----------
    
      Device Type:      ConnectX5
      Part Number:      MCX512A-ACU_Ax_Bx
      Description:      ConnectX-5 EN network interface card; 10/25GbE dual-port SFP28; PCIe3.0 x8; UEFI Enabled (x86/ARM)
      PSID:             MT_0000000425
      PCI Device Name:  mt4119_pciconf0
      Base GUID:        ***
      Base MAC:         ***
      Versions:         Current        Available     
         FW             16.32.1010     16.34.1002    
         PXE            3.6.0502       3.6.0700      
         UEFI           14.25.0017     14.27.0014    
    
      Status:           Update required
    
    Device #2:
    ----------
    
      Device Type:      ConnectX5
      Part Number:      MCX512A-ACU_Ax_Bx
      Description:      ConnectX-5 EN network interface card; 10/25GbE dual-port SFP28; PCIe3.0 x8; UEFI Enabled (x86/ARM)
      PSID:             MT_0000000425
      PCI Device Name:  mt4119_pciconf1
      Base GUID:        ***
      Base MAC:         ***
      Versions:         Current        Available     
         FW             16.32.1010     16.34.1002    
         PXE            3.6.0502       3.6.0700      
         UEFI           14.25.0017     14.27.0014    
    
      Status:           Update required
    
    ---------
    Found 2 device(s) requiring firmware update...
    
    Perform FW update? [y/N]: y
    Device #1: Updating FW ...     
    FSMST_INITIALIZE -   OK          
    Writing Boot image component -   OK                                                                                                                                                              Done
    Device #2: Updating FW ...     
    FSMST_INITIALIZE -   OK          
    Writing Boot image component -   OK                                                                                                                                                              Done
    
    Restart needed for updates to take effect.
    
    • 重启后再检查网卡的固件版本
    [root@yaoge123:~] esxcli network nic get -n vmnic2
       Advertised Auto Negotiation: true
       Advertised Link Modes: Auto, 1000BaseCX-SGMII/Full, 10000BaseKR/Full, 25000BaseTwinax/Full
       Auto Negotiation: true
       Backing DPUId: N/A
       Cable Type: FIBRE
       Current Message Level: -1
       Driver Info: 
             Bus Info: 0000:8a:00:0
             Driver: nmlx5_core
             Firmware Version: 16.34.1002
             Version: 4.23.0.36
       Link Detected: true
       Link Status: Up 
       Name: vmnic2
       PHYAddress: 0
       Pause Autonegotiate: false
       Pause RX: true
       Pause TX: true
       Supported Ports: FIBRE, DA
       Supports Auto Negotiation: true
       Supports Pause: true
       Supports Wakeon: false
       Transceiver: internal
       Virtual Address: **:**:**:**:**:**
       Wakeon: None
    

    Veeam 不同备份方式的负载

    IO负载

    Method I/O impact on destination storage
    Forward incremental 1x write I/O for incremental backup size
    Forward incremental, active full 1x write I/O for total full backup size
    Forward incremental, transform 2x I/O (1x read, 1x write) for incremental backup size
    Forward incremental, synthetic full 2x I/O (1x read, 1x write) for entire backup chain
    Reversed incremental 3x I/O (1x read, 2x write) for incremental backup size
    Synthetic full with transform to rollbacks 4x I/O (2x read, 2x write) for entire backup chain

     

    Reversed Incremental Backup:每次备份对备份存储的IO压力很大,备份窗口时间长,但是备份空间占用最少,只有一个最新的完整备份。

    Forward Incremental Backup:每次备份对备份存储的IO压力最小,备份窗口时间最短,可能会需要额外备份空间存储多个完整备份。
    Forever forward incremental Backup:对源存储无压力,如果虚拟机变化很大,合并最后一个增量和完整备份可能压力大,只有一个最老的完整备份。
    Synthetic Full Backup:对源存储无压力,对备份存储IO压力较大,因为不是备份过程所以没有备份窗口,一般会保留有多个完整备份。
    Transforming Incremental Backup Chains into Reversed Incremental Backup Chains:对源存储无压力,对备份存储IO压力极大,因为不是备份过程所以没有备份窗口,只保留一个完整备份。
    Active Full Backup:完全从源存储创建一个完整备份,需要从源读取所有数据,对备份存储是顺序IO写因此压力不大,但是备份窗口时间非常长,对生产系统源存储性能有负面影响,一般会保留有多个完整备份。

     

    VCSA 6.0 升级 6.5

    密码重置

    1. 重置VCSA OS GRUB密码
      http://www.unixarena.com/2016/04/reset-grub-root-password-vcsa-6-0.html
    2. 重置VCSA OS root密码
      https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2069041
    3. 重置administrator@vsphere.local的密码
      https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2146224

    vCenter Update Manager迁移

    VCSA 6.5开始Update Manager被集成到VCSA中,所以需要迁移工具将原来独立的Update Manager迁移到VCSA 6.5中

    1. 如果修改过VCSA的密码,最好重新配置vCenter Update Manager,并重启它
      https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1034605
    2. 在vCenter Update Manager上运行迁移工具,保持运行窗口打开状态,迁移完成程序会自动退出
      https://docs.vmware.com/cn/VMware-vSphere/6.5/com.vmware.vsphere.upgrade.doc/GUID-6A39008B-A78C-4632-BC55-0517205198C5_copy.html
    3. 确保vCenter Update Manager有足够的空余空间,迁移时会打包文件

    迁移时部署大小

    迁移VCSA时到选择部署大小时,发现tiny/small等小的部署大小不现实,主要是因为原VCSA存储空间消耗过多
    https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2148587

    设置时区

    升级后登录VAMI时区显示空,并且无法设置,需要SSH登录到VCSA执行

    cd /etc/
    rm -rf localtime
    ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

    Shockwave Flash crashes

    这是Adobe Shockwave Flash version 27.0.0.170已知的问题,只能升级到更新的版本或者降级到老版本
    https://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=2151945

     

    VMware 迁移步骤

    因原虚拟化平台设备已使用约9年,已经带病运行,且性能上已无法满足,因此购置了全新的一套设备准备替换。

    首先对新平台设备进行安装调试,作为一个新的集群加入到原VCSA中。

    如两个集群的存储互通,则虚拟机关机在新集群启动后,可以在线迁移存储,这样停机时间很短。如两个集群的存储不互通,则需要虚拟机关机才能迁移存储,停机时间较长。

    新的集群采用了DVS,但是DVS需要VCSA才能管理,所以VC绝不能通过其自身管理的DVS进行通讯。DVS接管万兆网跑业务、vMotion、VSAN等流量,VCSA通过标准虚拟交换机用千兆网进行管理。

    VCSA关机,用Client登录老集群VCSA所在ESXi导出VCSA,等完成后用Client登录新集群任意一台ESXi导入VCSA,会发现不能选择任何DVS中的端口组,只能选择标准交换机

    VMware 下 Linux 虚拟机硬盘扩容

    对VMware下CentOS7虚拟机,根分区进行扩容

    1. VMware编辑虚拟机硬件,扩容硬盘,做快照
    2. 用CentOS安装光盘引导启动虚拟机,进入救援模式
    3. parted /dev/sda,resizepart LVM分区
    4. partprobe
    5. pvresize /dev/sda3,扩充pv,sda3是LVM的分区
    6. pvs,确认pv已扩容
    7. lvextend -l +100%FREE /dev/system/root,扩充lv,/dev/system/root是lv的路径,也可能是/dev/mapper/VolGroup-lv_root
    8. vgs确认容量已扩容
    9. 对于XFS:mount /dev/mapper/system-root /mnt; xfs_growfs /dev/mapper/system-root
    10. 对于EXT4:e2fsck -f /dev/mapper/system-root; resize2fs  /dev/mapper/system-root

    定制ESXi安装包

    比如Intel J1900 四千兆的无风扇迷你电脑,AHCI(Intel Atom Processor E3800 Series SATA AHCI Controller 8086:0f23)不受ESXi支持,因此需要在ESXi的安装包中添加驱动。

    准备工作:

    1. 下载安装VMware PowerCLI
    2. 下载ESXi Offline Bundle放到D:\VMware
    3. 从http://www.v-front.de/p/esxi-customizer-ps.html下载ESXi-Customizer-PS放到D:\vm
    4. 从https://vibsdepot.v-front.de/wiki/index.php/Sata-xahci下载sata-xahci Offline Bundle放到D:\vm

    启动VMware PowerCLI,执行下面的命令

    Set-ExecutionPolicy -ExecutionPolicy Unrestricted -Scope CURRENTUSER

    重启计算机,再打开VMware PowerCLI

    PowerCLI C:\> d:
    PowerCLI D:\> cd vm
    PowerCLI D:\vm> .\ESXi-Customizer-PS-v2.4.ps1 -v60 -pkgDir d:\vm -log d:\vm\log.txt -izip 'D:\VMware\update-from-esxi6.0-6.0_update02.zip'
    Set-ExecutionPolicy Undefined -Scope CURRENTUSER
    PowerCLI D:\vm> Set-ExecutionPolicy Undefined -Scope CURRENTUSER

    用D:\vm下面新生成的iso安装ESXi

     

    替换 VMware vCenter Server Appliance 5.5 证书

    1. 修改主机的IP、域名、主机名符合新证书的要求,将Certificate regeneration enabled改为Yes,Reboot vCenter,再将Certificate regeneration enabled改为No。

    2. 停止服务:

    service vmware-stsd stop
    service vmware-vpxd stop
    
    
    service vmware-rbd-watchdog stop
    rm /var/vmware/vpxd/autodeploy_registered
    

    2. 把证书、私钥、证书链传到ssl/vpxd下面,文件名分别为:证书rui.crt,私钥rui.key,证书链cachain.pem,内容为证书链的逆序文件最后应该为自签名的RootCA,合并证书和证书链

    cd
    mkdir ssl
    mkdir ssl/vpxd
    mkdir ssl/inventoryservice
    mkdir ssl/logbrowser
    mkdir ssl/autodeploy
    cd ssl/vpxd
    ……
    cat rui.crt cachain.pem > chain.pem

    3. 替换vpxd证书

    cd
    cd ssl/vpxd
    /usr/sbin/vpxd_servicecfg certificate change chain.pem rui.key
    

    返回VC_CFG_RESULT = 0 表示成功,如果非0请看这里

    4. 替换vCenter Inventory Service证书

    service vmware-stsd start
    cd /etc/vmware-sso/register-hooks.d
    ./02-inventoryservice --mode uninstall --ls-server https://server.domain.com:7444/lookupservice/sdk
    cd
    cp ssl/vpxd/* ssl/inventoryservice/
    cd ssl/inventoryservice/
    openssl pkcs12 -export -out rui.pfx -in chain.pem -inkey rui.key -name rui -passout pass:testpassword
    cp rui.key /usr/lib/vmware-vpx/inventoryservice/ssl
    cp rui.crt /usr/lib/vmware-vpx/inventoryservice/ssl
    cp rui.pfx /usr/lib/vmware-vpx/inventoryservice/ssl
    cd /usr/lib/vmware-vpx/inventoryservice/ssl/
    chmod 400 rui.key rui.pfx
    chmod 644 rui.crt
    cd /etc/vmware-sso/register-hooks.d
    ./02-inventoryservice --mode install --ls-server https://server.domain.com:7444/lookupservice/sdk --user administrator@vSphere.local --password sso_administrator_password
    rm /var/vmware/vpxd/inventoryservice_registered
    service vmware-inventoryservice stop
    service vmware-vpxd stop
    service vmware-inventoryservice start
    service vmware-vpxd start
    

    5. 替换VMware Log Browser service证书

    cd /etc/vmware-sso/register-hooks.d
    ./09-vmware-logbrowser --mode uninstall --ls-server https://server.domain.com:7444/lookupservice/sdk
    cd
    cp ssl/vpxd/* ssl/logbrowser/
    cd ssl/logbrowser/
    openssl pkcs12 -export -in rui.crt -inkey rui.key -name rui -passout pass:testpassword -out rui.pfx
    cp rui.key /usr/lib/vmware-logbrowser/conf
    cp rui.crt /usr/lib/vmware-logbrowser/conf
    cp rui.pfx /usr/lib/vmware-logbrowser/conf
    cd /usr/lib/vmware-logbrowser/conf
    chmod 400 rui.key rui.pfx
    chmod 644 rui.crt
    cd /etc/vmware-sso/register-hooks.d
    ./09-vmware-logbrowser --mode install --ls-server https://server.domain.com:7444/lookupservice/sdk --user administrator@vSphere.local --password sso_administrator_password
    service vmware-logbrowser stop
    service vmware-logbrowser start

    6. 替换vSphere Auto Deploy证书

    cd
    cp ssl/vpxd/* ssl/autodeploy/
    cp ssl/autodeploy/rui.crt /etc/vmware-rbd/ssl/waiter.crt
    cp ssl/autodeploy/rui.key /etc/vmware-rbd/ssl/waiter.key
    cd /etc/vmware-rbd/ssl/
    chmod 644 waiter.crt
    chmod 400 waiter.key
    chown deploy:deploy waiter.crt waiter.key
    service vmware-rbd-watchdog stop
    rm /var/vmware/vpxd/autodeploy_registered
    service vmware-vpxd restart

    7. Reboot vCenter

    Veeam的备份方式

    Veeam创建两种备份文件,vbk是一个完整的备份,vrb/vib是增量备份文件用来记录改变。

    Reversed Incremental Backup

    每次增量备份都会更新vbk文件,vbk中是最新的完整备份,恢复最新的备份只要恢复vbk就可以了。每次增量备份时vbk中修改的内容会被保存到vrb中,所以称之为Reversed,vrb中保存的不是修改的新数据,而是被覆盖的旧数据,恢复以前的备份需要将vrb和vbk合并出来。这种方法永远是增量备份,节省硬盘空间,这是磁盘上面备份的推荐方案。

    Retention Policy

    保留策略会立刻直接删除超期的增量vrb文件,最节约磁盘空间。

    看这里的动画演示

    Forward Incremental Backup

    每次增量备份只将改变的部分保存成一个新的vib文件,如果需要将备份数据存储到磁带或远程,这种方法每次只要保存新的vib文件即可,或者有法规要求备份不得修改,那么这是最好的选择。显而易见,vib文件会越来越多,这会导致恢复的时候需要合并过多的vib文件,因此需要使用 active full 或 synthetic full backups 解决长链问题。

    Forever forward incremental Backup

    只有第一次备份时创建一个完整备份,以后备份只创建增量备份。到达所需的保留时间后,会将最早的增量备份和完整备份合并形成新的完整备份,就像完整备份向前移动一样,备份存储上只有一个完整的备份。

    Synthetic Full Backup

    使用active full backup是非常消耗源系统资源的过程,synthetic full backup则是使用以前的完整备份vbk和增量备份vib合并出一个新的完整备份。因为不需要读取源,因此对源系统的压力小得多。显然以后forward incremental backup会从这个新的完整备份为基础创建增量备份文件。

    Transforming Incremental Backup Chains into Reversed Incremental Backup Chains

    使用incremental backup时如果选择了synthetic full backups,那么就可以选择这种方式。系统只会保留一个完整备份,这个完整备份之前的备份会被转换成reversed incremental backup方式,只将被覆盖的数据保存到vrb中,完整备份之后数据还是正常incremental backup方式。

    Active Full Backup

    从源创建一个完整的备份,以后forward incremental backup会从这个新的完整备份为基础创建增量备份文件。

    Retention Policy (forward incremental backup)

    如果选择了Synthetic Full Backup或Active Full Backup,只有当一个增量备份链表的最后一个增量备份超期时才会删除整个增量备份链表。如果没有选择,则只会存在一个完整备份,每次备份将完整备份和前面的一个增量合并成新的完整备份。

    看这里的动画演示