Prometheus + Grafana 监控 NVIDIA GPU

1.首先安装 NVIDIA Data Center GPU Manager (DCGM),从 https://developer.nvidia.com/dcgm 下载安装

nv-hostengine -t
yum erase -y datacenter-gpu-manager
rpm -ivh datacenter-gpu-manager*
systemctl enable --now dcgm.service

2. 安装 NVIDIA DCGM exporter for Prometheus,从 https://github.com/NVIDIA/gpu-monitoring-tools/tree/master/exporters/prometheus-dcgm 下载手工安装

wget -q -O /usr/local/bin/dcgm-exporter https://raw.githubusercontent.com/NVIDIA/gpu-monitoring-tools/master/exporters/prometheus-dcgm/dcgm-exporter/dcgm-exporter
chmod +x /usr/local/bin/dcgm-exporter
mkdir /run/prometheus 
wget -q -O /etc/systemd/system/prometheus-dcgm.service https://raw.githubusercontent.com/NVIDIA/gpu-monitoring-tools/master/exporters/prometheus-dcgm/bare-metal/prometheus-dcgm.service
systemctl daemon-reload
systemctl enable --now prometheus-dcgm.service

3. 从 https://prometheus.io/download/#node_exporter 下载 node_exporter,手工安装为服务并添加 dcgm-exporter 资料

tar xf node_exporter*.tar.gz
mv node_exporter-*/node_exporter /usr/local/bin/
chown root:root /usr/local/bin/node_exporter
chmod +x /usr/local/bin/node_exporter

cat > /etc/systemd/system/node_exporter.service <<EOF
[Unit]
Description=Prometheus Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
EOF

sed -i '/ExecStart=\/usr\/local\/bin\/node_exporter/c\ExecStart=\/usr\/local\/bin\/node_exporter --collector.textfile.directory=\/run\/prometheus' /etc/systemd/system/node_exporter.service

systemctl daemon-reload
systemctl enable --now node_exporter.service

4. Grafana 添加这个Dashboard
https://grafana.com/grafana/dashboards/11752

Docker 自动更新 Let’s Encrypt

在nginx的 docker run 中添加webroot和配置文件挂载

-v $PWD/nginx/letsencrypt/:/var/www/letsencrypt:ro \
-v $PWD/letsencrypt/etc/:/etc/nginx/letsencrypt/:ro \

在nginx中将wwwroot发布出去

location ^~ /.well-known/ {
    root /var/www/letsencrypt/;
}

在nginx中配置证书文件

ssl_certificate letsencrypt/live/www.yaoge123.com/fullchain.pem;
ssl_certificate_key letsencrypt/live/www.yaoge123.com/privkey.pem;

创建 certbot 的docker run脚本,以后只要周期性运行这个脚本就可以自动更新证书了

#!/bin/sh
cd $(dirname $0)
pwd

docker run -it --rm \
	-v $PWD/letsencrypt/etc:/etc/letsencrypt \
	-v $PWD/letsencrypt/lib:/var/lib/letsencrypt \
	-v $PWD/letsencrypt/log:/var/log/letsencrypt \
	-v $PWD/nginx/letsencrypt:/var/www \
	certbot/certbot \
	certonly --webroot \
	--email yaoge123@example.com --agree-tos --no-eff-email \
	--webroot-path=/var/www/ \
	-n \
	--domains www.yaoge123.com
docker kill --signal=HUP nginx

CentOS 7 YUM 安装 Cacti

先添加EPEL再用yum安装cacti和中文字体

yum install cacti cacti-spine mariadb-server google-noto-sans-simplified-chinese-fonts

编辑 /etc/httpd/conf.d/cacti.conf ,在 Directory /usr/share/cacti/ 中添加可访问的浏览器客户端

编辑 /etc/cron.d/cacti ,去掉注释

编辑 /etc/spine.conf ,注释RDB_*

创建数据库

[root@yaoge123]# mysqladmin --user=root create cacti

创建数据库用户

[root@yaoge123]# mysql --user=root mysql
MariaDB [mysql]> GRANT ALL ON cacti.* TO cactiuser@localhost IDENTIFIED BY 'cactiuser';
MariaDB [mysql]> flush privileges;

数据库用户增加 timezone 权限

[root@yaoge123]# mysql -u root
MariaDB [(none)]> GRANT SELECT ON mysql.time_zone_name TO cactiuser@localhost IDENTIFIED BY 'cactiuser';
MariaDB [(none)]> flush privileges;

数据库增加 timezone

[root@yaoge123]# mysql_tzinfo_to_sql /usr/share/zoneinfo/ | mysql -u root mysql

新建一个文件 /etc/my.cnf.d/cacti.cnf ,内容供参考根据实际情况修改

[mysqld]
character-set-client = utf8mb4
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
innodb_additional_mem_pool_size = 80M
innodb_buffer_pool_size = 1024M
innodb_doublewrite = ON
innodb_file_format = Barracuda
innodb_file_per_table = ON
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_large_prefix = ON
join_buffer_size = 748M
max_allowed_packet = 16777216
max_heap_table_size = 374M
tmp_table_size = 374M

重启相关服务,设置开机自动启动

systemctl restart mariadb
systemctl enable mariadb
systemctl restart httpd
systemctl enable httpd

导入数据库

[root@yaoge123]# mysql cacti < /usr/share/doc/cacti-*/cacti.sql

浏览器打开 http://<server>/cacti/ ,默认用户名密码为 admin/admin

HPE ProLiant DL380 Gen10 不同BIOS设置内存性能测试

硬件环境

2*Intel(R) Xeon(R) Gold 5122 CPU @ 3.60GHz
12*HPE SmartMemory DDR4-2666 RDIMM 16GiB

iLO 5 1.37 Oct 25 2018
System ROM U30 v1.46 (10/02/2018)
Intelligent Platform Abstraction Data 7.2.0 Build 30
System Programmable Logic Device 0x2A
Power Management Controller Firmware 1.0.4
NVMe Backplane Firmware 1.20
Power Supply Firmware 1.00
Power Supply Firmware 1.00
Innovation Engine (IE) Firmware 0.1.6.1
Server Platform Services (SPS) Firmware 4.0.4.288
Redundant System ROM U30 v1.42 (06/20/2018)
Intelligent Provisioning 3.20.154
Power Management Controller FW Bootloader 1.1
HPE Smart Storage Battery 1 Firmware 0.60
HPE Eth 10/25Gb 2p 631FLR-SFP28 Adptr 212.0.103001
HPE Ethernet 1Gb 4-port 331i Adapter – NIC 20.12.41
HPE Smart Array P816i-a SR Gen10 1.65
HPE 100Gb 1p OP101 QSFP28 x16 OPA Adptr 1.5.2.0.0
HPE InfiniBand EDR/Ethernet 100Gb 2-port 840QSF 12.22.40.30
Embedded Video Controller 2.5

软件环境

CentOS Linux release 7.6.1810 (Core)
Linux yaoge123 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Intel(R) Memory Latency Checker – v3.6

Continue reading

OpenLDAP 升级报错pwdMaxRecordedFailure不存在

升级到OpenLDAP 2.4.44,出现以下错误

User Schema load failed for attribute "pwdMaxRecordedFailure". Error code 17: attribute type undefined
config error processing olcOverlay={1}ppolicy,olcDatabase={2}hdb,cn=config: User Schema load failed for attribute "pwdMaxRecordedFailure". Erro...ype undefined
slapd stopped.

解决办法

cd /etc/openldap/slapd.d/cn=config/cn=schema
mv cn\=\{3\}ppolicy.ldif cn\=\{3\}ppolicy.ldif.bak
mv /etc/openldap/schema/ppolicy.ldif cn\=\{3\}ppolicy.ldif

 

OpenLDAP 密码策略

OpenLDAP默认是没有密码检查策略的,123456这也得密码也能接受,这显然是管理员不希望看到的。

  1. 导入密码策略schema
    ldapadd -Y EXTERNAL -H ldapi:/// -D "cn=config" -f /etc/openldap/schema/ppolicy.ldif
  2. 加载模块,因为已经添加过syncprov模块了,所以只要追加ppolicy模块就可以了
    dn: cn=module{0},cn=config
    changetype: modify
    add: olcModuleLoad
    olcModuleLoad: ppolicy.la
    
    ldapmodify -Y EXTERNAL -H ldapi:/// -f mod_ppolicy.ldif
  3. 指定默认策略dn名
    dn: olcOverlay=ppolicy,olcDatabase={2}hdb,cn=config
    changeType: add
    objectClass: olcOverlayConfig
    objectClass: olcPPolicyConfig
    olcOverlay: ppolicy
    olcPPolicyDefault: cn=default,ou=ppolicy,dc=yaoge123,dc=com
    olcPPolicyHashCleartext: TRUE
    ldapmodify -Y EXTERNAL -H ldapi:/// -f ppolicy.ldif
  4. 创建默认策略
    dn: ou=ppolicy,dc=yaoge123,dc=com
    objectClass: organizationalUnit
    objectClass: top
    ou: ppolicy
    
    dn: cn=default,ou=ppolicy,dc=yaoge123,dc=com
    cn: default
    objectClass: top
    objectClass: device
    objectClass: pwdPolicy
    objectClass: pwdPolicyChecker
    pwdAllowUserChange: TRUE
    pwdAttribute: userPassword
    pwdCheckQuality: 2
    pwdExpireWarning: 604800
    pwdFailureCountInterval: 0
    pwdGraceAuthnLimit: 5
    pwdInHistory: 5
    pwdLockout: TRUE
    pwdLockoutDuration: 600
    pwdMaxAge: 0
    pwdMaxFailure: 5
    pwdMinAge: 0
    pwdMinLength: 8
    pwdMustChange: FALSE
    pwdSafeModify: FALSE
    pwdCheckModule: check_password.so
    ldapadd -Y EXTERNAL -H ldapi:/// -f defaultppolicy.ldif
  5. 修改/etc/openldap/check_password.conf,定义check_password.so规则
  6. MirrorMode的两台LDAP均需进行上述同样的配置

CentOS 7 下 OpenLDAP 安装配置

目标是ldap1和ldap2做成高可用LDAP为集群中所有节点提供身份验证服务。

一、LDAP服务端,ldap1和ldap2均需安装配置

安装OpenLDAP并导入基本定义

yum install -y openldap openldap-clients openldap-servers
cp /usr/share/openldap-servers/DB_CONFIG.example /var/lib/ldap/DB_CONFIG
chown -R ldap:ldap /var/lib/ldap
systemctl enable slapd.service
systemctl start slapd.service
ldapadd -Y EXTERNAL -H ldapi:/// -D "cn=config" -f /etc/openldap/schema/cosine.ldif
ldapadd -Y EXTERNAL -H ldapi:/// -D "cn=config" -f /etc/openldap/schema/nis.ldif

修改LDAP基本配置

  1. 创建db.ldif,内容见后
  2. olcRootDN就是LDAP的超级用户
  3. 用slappasswd生成密码的哈希填入olcRootPW中做为olcRootDN的密码
  4. 修改dc=为自己的域
  5. monitor默认所有用户均可读取,通过添加olcAccess改为只有root和RootDN可以读取
    dn: olcDatabase={2}hdb,cn=config
    changetype: modify
    replace: olcSuffix
    olcSuffix: dc=yaoge123,dc=com
    
    dn: olcDatabase={2}hdb,cn=config
    changetype: modify
    replace: olcRootDN
    olcRootDN: cn=Manager,dc=yaoge123,dc=com
    
    dn: olcDatabase={2}hdb,cn=config
    changetype: modify
    replace: olcRootPW
    olcRootPW: {SSHA}lY3iu244B87mEjUzSyHboD3x0tjTRHCV
    
    dn: cn=config
    changetype: modify
    replace: olcLogLevel
    olcLogLevel: stats2 shell sync
    
    dn: olcDatabase={1}monitor,cn=config
    changetype: modify
    replace: olcAccess
    olcAccess: {0}to * by dn.base="gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth" read
      by dn.base="cn=Manager,dc=yaoge123,dc=com" read
      by * none
  6. 用ldapmodify导入
    ldapmodify -Y EXTERNAL -H ldapi:/// -f db.ldif

生成证书并设置TLS

  1. OpenLDAP默认使用明码传输,为了加密数据必须生成证书并配置TLS
  2. 在一个安全的服务器上生成CA的私钥和证书,有效期20年
    openssl req -new -x509 -nodes -out ca-cert.pem -keyout ca-key.pem -days 7305
  3. 在LDAP服务器生成私钥和证书请求,证书CN需为客户端访问使用的地址(域名、机器名、IP)
    openssl req -new -nodes -out cert.csr -keyout key.pem
  4. 用CA签发证书,有效期20年
    openssl x509 -req -in cert.csr -CAkey ca-key.pem -CA ca-cert.pem -out cert.pem -set_serial 01 -days 7305
  5. 把CA证书、服务器私钥和证书放到/etc/openldap/certs下,并修改属主和权限
    mv ca-cert.pem cert.pem key.pem /etc/openldap/certs/
    chown -R ldap:ldap /etc/openldap/certs/{ca-cert,cert,key}.pem
    chmod 644 /etc/openldap/certs/ca-cert.pem
    chmod 644 /etc/openldap/certs/cert.pem
    chmod 600 /etc/openldap/certs/key.pem
  6. 创建certs.ldif
    dn: cn=config
    changetype: modify
    replace: olcTLSCACertificateFile
    olcTLSCACertificateFile: /etc/openldap/certs/ca-cert.pem
    
    dn: cn=config
    changetype: modify
    replace: olcTLSCertificateFile
    olcTLSCertificateFile: /etc/openldap/certs/cert.pem
    
    dn: cn=config
    changetype: modify
    replace: olcTLSCertificateKeyFile
    olcTLSCertificateKeyFile: /etc/openldap/certs/key.pem
  7. 用ldapmodify导入
    ldapmodify -Y EXTERNAL -H ldapi:/// -f certs.ldif
  8. 修改LDAP服务端配置/etc/sysconfig/slapd,只启用ldapi和ldaps
    SLAPD_URLS="ldapi:/// ldaps:///"
  9. 重启LDAP服务
    systemctl restart slapd.service
  10. 验证TLS,可以看到证书链中有CA和LDAP服务器两个证书,最后显示Verify return code: 0 (ok)
    openssl s_client -connect ldap1:636 -showcerts -state -CAfile /etc/openldap/certs/ca-cert.pem
    
  11. ldap1已完成,从3.生成私钥和证书请求开始为ldap2配置,注意签发证书时set_serial需要不同,签发证书有效期需在CA证书有效期内
  12. 检查证书
    openssl req -in cert.csr -noout -text  //查看证书请求文件
    openssl x509 -in cert.pem -noout -text  //查看证书

创建自己的域

  1. 创建一个文本文件base.ldif,内容见后
  2. 修改dc为自己的域名
    dn: dc=yaoge123,dc=com
    dc: yaoge123
    objectClass: top
    objectClass: domain
    
    dn: ou=People,dc=yaoge123,dc=com
    ou: People
    objectClass: top
    objectClass: organizationalUnit
    
    dn: ou=Group,dc=yaoge123,dc=com
    ou: Group
    objectClass: top
    objectClass: organizationalUnit
  3. 用ldapadd导入,输入RootDN密码
    ldapadd -x -W -D cn=Manager,dc=yaoge123,dc=com -H ldapi:/// -f base.ldif

LDAP复制

  1. OpenLDAP 2.4支持5种复制方式,其中MirrorMode提供LDAP读写高可用性,双机Active-Active Hot-Standby,前端写入一个服务器即可,双机互相同步复制,只有一个服务器的情况下也支持写入
  2. 创建mod_syncprov.ldif
    dn: cn=module,cn=config
    objectClass: olcModuleList
    cn: module
    olcModulePath: /usr/lib64/openldap
    olcModuleLoad: syncprov.la
  3. 用ldapadd导入
    ldapadd -Y EXTERNAL -H ldapi:/// -f mod_syncprov.ldif
  4. 创建syncprov.ldif
    dn: olcDatabase={2}hdb,cn=config
    changetype: modify
    add: olcDbIndex
    olcDbIndex: entryCSN,entryUUID eq
    
    dn: olcOverlay=syncprov,olcDatabase={2}hdb,cn=config
    changeType: add
    objectClass: olcOverlayConfig
    objectClass: olcSyncProvConfig
    olcOverlay: syncprov
    olcSpCheckpoint: 100 10
    olcSpSessionLog: 100
  5. 用ldapmodify导入
    ldapmodify -Y EXTERNAL -H ldapi:/// -f syncprov.ldif
  6. 创建syncuser.ldif,创建用于同步的用户,用slappasswd生成密码的哈希填入userPassword中做为同步用户的密码
    dn: cn=ldapreader,dc=yaoge123,dc=com
    objectClass: simpleSecurityObject
    objectClass: organizationalRole
    cn: ldapreader
    description: LDAP reader user 
    userPassword: {SSHA}95M+f4bXaOF4DwJ5HdMY75kkqNXEFJRU
  7. 用ldapadd导入,输入RootDN密码
    ldapadd -x -W -D cn=Manager,dc=yaoge123,dc=com -H ldapi:/// -f syncuser.ldif
  8. 创建ldap1sync.ldif,credentials需为同步用户的密码
    dn: cn=config
    changeType: modify
    add: olcServerID
    olcServerID: 1
    
    dn: olcDatabase={2}hdb,cn=config
    changeType: modify
    add: olcSyncrepl
    olcSyncrepl: rid=001 provider=ldaps://ldap2 bindmethod=simple binddn="cn=ldapreader,dc=yaoge123,dc=com" credentials=yaoge123 searchbase="dc=yaoge123,dc=com" schemachecking=on type=refreshAndPersist retry="60 +" tls_cacert=/etc/openldap/certs/ca-cert.pem
    -
    add: olcMirrorMode
    olcMirrorMode: TRUE
  9. 在ldap1上用ldapmodify导入
    ldapmodify -Y EXTERNAL -H ldapi:/// -f ldap1sync.ldif
  10. 创建ldap2sync.ldif,credentials需为同步用户的密码
    dn: cn=config
    changeType: modify
    add: olcServerID
    olcServerID: 2
    
    dn: olcDatabase={2}hdb,cn=config
    changeType: modify
    add: olcSyncrepl
    olcSyncrepl: rid=001 provider=ldaps://ldap1 bindmethod=simple binddn="cn=ldapreader,dc=yaoge123,dc=com" credentials=yaoge123 searchbase="dc=yaoge123,dc=com" schemachecking=on type=refreshAndPersist retry="60 +" tls_cacert=/etc/openldap/certs/ca-cert.pem
    -
    add: olcMirrorMode
    olcMirrorMode: TRUE
  11. 在ldap2上用ldapmodify导入
    ldapmodify -Y EXTERNAL -H ldapi:/// -f ldap2sync.ldif
  12. 修改其中一个服务器上的LDAP,查看另一个服务器是否自动同步。
  13. 如果同步有问题,用调试模式运行ldap
    /usr/sbin/slapd -u ldap -g ldap -h "ldapi:// ldaps://" -d -1

迁移已有用户

  1. 安装迁移工具
    yum install -y migrationtools
  2. vi /usr/share/migrationtools/migrate_common.ph,修改默认dc
    # Default DNS domain
    $DEFAULT_MAIL_DOMAIN = "yaoge123.com";
    
    # Default base
    $DEFAULT_BASE = "dc=yaoge123,dc=com";
    
  3. 处理一下passwd和group,把系统账号都去掉,对于RHEL7/CentOS7来说,系统账号都是<1000的,migrate_passwd.pl会从/etc/shadow提取密码信息
    grep ":10[0-9][0-9]" /etc/passwd > passwd
    grep ":10[0-9][0-9]" /etc/group > group
    /usr/share/migrationtools/migrate_passwd.pl passwd users.ldif
    /usr/share/migrationtools/migrate_group.pl group groups.ldif
  4. 导入LDAP中,输入RootDN密码
    ldapadd -x -W -D "cn=Manager,dc=yaoge123,dc=com" -H ldapi:/// -f users.ldif
    ldapadd -x -W -D "cn=Manager,dc=yaoge123,dc=com" -H ldapi:/// -f groups.ldif
    
  5. 创建index.ldif,做一些索引,最好日志中没有bdb_equality_candidates
    dn: olcDatabase={2}hdb,cn=config
    changetype: modify
    add: olcDbIndex
    olcDbIndex: uid,uidNumber,gidNumber,member,memberUid eq
  6. 导入LDAP中
    ldapmodify -Y EXTERNAL -H ldapi:/// -f index.ldif

修改LDAP ACL:

  1. 密码和密码修改时间,RootDN可管理,自己可写,同步用户可读,匿名和普通用户只能Bind进行认证
  2. 其它所有RootDN可管理,匿名和普通用户可读
  3. 以上ACL保证了密码安全性,做成一个access.ldif文件
    dn: olcDatabase={2}hdb,cn=config
    changetype: modify
    replace: olcAccess
    olcAccess: {0}to dn.children="dc=nnlmhpcc" attrs=userPassword,shadowLastChange
      by dn="cn=Manager,dc=nnlmhpcc" manage
      by dn.base="gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth" manage
      by dn="cn=ldapreader,dc=nnlmhpcc" read
      by self write
      by * auth
    olcAccess: {1}to *
      by dn="cn=Manager,dc=nnlmhpcc" manage
      by dn.base="gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth" manage
      by * read
    
  4. 导入LDAP中,输入RootDN密码
    ldapmodify -Y EXTERNAL -H ldapi:/// -f access.ldif
  5. 确认匿名和普通用户无法查看密码,两台LDAP同步正常(含密码),用户自己可以自己修改密码
  6. 如果有问题,用ldapmodify -Y EXTERNAL -H ldapi:/// -f导入下面的文件打开ACL的日志检查,检查完别忘了把ACL日志关掉
    dn: cn=config
    changetype: modify
    replace: olcLogLevel
    olcLogLevel: acl stats stats2 shell sync

LDAP认证的客户端:

  1. 安装nss-pam-ldapd
    yum install -y nss-pam-ldapd
  2. 配置普通用户使用LDAP认证,因为已经使用ldaps故需disableldaptls,bashdn下必须包含People和Group
    authconfig --enableldap --enableldapauth --ldapserver="ldaps://ldap1,ldaps://ldap2" --ldapbasedn="dc=yaoge123,dc=com" --disableldaptls --ldaploadcacert=http://www.yaoge123.com/ca-cert.pem --updateall
  3. 测试用户登录、修改密码是否正常。

对于虚拟化部署的ldap1和ldap2,需要添加规则让两个虚机不在同一个主机上运行。

集群节点较多时,slapd会报错Too many open files。参考http://smilejay.com/2016/06/centos-7-systemd-conf-limits/和http://www.cnblogs.com/chris-cp/p/6667753.html,修改slapd的Max open files限制, 查看限制:

grep files /proc/`pidof slapd`/limits

 

 

xCAT 安装迁移

新机器安装系统,配置yun源、Hostname、Timezone、resolv.conf、路由和每个网卡的IP,yum upgrade,禁用SELinux和Firewall,hosts中配置本机主机名和ip,如公网网卡采取DHCP 配置中需添加PEERDNS=no 和 IPV6_PEERDNS=no

VM虚拟机中安装vmware tools,vmware-toolbox-cmd timesync status确认时间同步是否启用,vmware-toolbox-cmd timesync enable启用虚拟机和主机的时间同步

yum install rsync net-snmp-utils

重启后使用go-xcat install 自动化安装xcat

tabedit site:添加修改dhcpinterfaces、managedaddressmode、domain、master、dnsinterfaces、extntpservers;确认forwarders、nameservers。

tabedit networks:确认修改mgtifname、gateway、dhcpserver、tftpserver、ntpservers

修改/etc/resolv.conf,search为site表中的domain,nameserver为xcat自身

makedns,测试dns是否正常

修改/etc/chrony.conf,测试ntp是否正常

修改/etc/exports,限定IP地址范围

修改/etc/httpd/conf/httpd.conf,限定只监听内网的80端口

修改/etc/logrotate.conf,满足合规性要求,增加日志保留时间并启用压缩

迁移/etc/hosts.deny和hosts.all,配置只允许指定IP进行远程登录

修改/etc/postfix/main.cf中的myhostname和inet_interfaces

拷贝旧机器/install下的os image、自定义脚本等到新机器下

tabedit passwd:添加system的用户名密码,密码可以用openssl passwd -1加密

旧机器导出xCAT数据库dumpxCATdb -p /tmp/db,至少将nodelist chain bootparams nodetype mac hosts postscripts noderes nodehm osimage linuximage osdistro ipmi mp mpa等自定义表在新机器上restorexCATdb -p导入

如果需要迁移eventlog和auditlog,导入导出需添加-a参数,auditlog因为比较大导入非常慢

迁移/var/log下的日志

替换root ssh key为新的

迁移/etc/cron.d下的自定义定时任务

Linux DHCP 下自定义路由和网关

主机IP必须通过DHCP获得,但是因故需要重新指定网关并做策略路由。例如主机DHCP获取IP段192.168.1.0/24,DHCP获取网关192.168.1.1,拟将默认路由改为192.168.1.2,本地IP仍然走网关192.168.1.1

RHEL(CentOS) 6/7

/etc/sysconfig/network-scripts/ifcfg-eth0
BOOTPROTO=dhcp
NM_CONTROLLED="no"
ONBOOT=yes
GATEWAY=192.168.1.2
……

/etc/sysconfig/network-scripts/route-eth0
192.168.0.0/16 via 192.168.1.1

/etc/sysconfig/network
NETWORKING=yes
……

Debian 7

/etc/network/interfaces
……
up route del default dev eth0
up route add -net 192.168.1.0 netmask 255.255.255.0 gw 192.168.1.1 dev eth0
up route add default gw 192.168.1.2 dev eth0
……

Suse 11

/etc/sysconfig/network/routes
……
192.168.0.0 192.168.1.1 255.255.0.0 eth0
default 192.168.1.2 - -