ubuntu 20.04 LTS,开机后需要手工zpool import,否则zfs没有挂载
手工执行/sbin/zpool import -c /etc/zfs/zpool.cache -aN,如果出错则用 zpool reguid <pool> 更新zpool.cache
创建目录和文件/etc/systemd/system/zfs-import-cache.service.d/override.conf,内容为
[Unit]
After=multipathd.service
ubuntu 20.04 LTS,开机后需要手工zpool import,否则zfs没有挂载
手工执行/sbin/zpool import -c /etc/zfs/zpool.cache -aN,如果出错则用 zpool reguid <pool> 更新zpool.cache
创建目录和文件/etc/systemd/system/zfs-import-cache.service.d/override.conf,内容为
[Unit]
After=multipathd.service
1.首先安装 NVIDIA Data Center GPU Manager (DCGM),从 https://developer.nvidia.com/dcgm 下载安装
nv-hostengine -t
yum erase -y datacenter-gpu-manager
rpm -ivh datacenter-gpu-manager*
systemctl enable --now dcgm.service
2. 安装 NVIDIA DCGM exporter for Prometheus,从 https://github.com/NVIDIA/gpu-monitoring-tools/tree/master/exporters/prometheus-dcgm 下载手工安装
wget -q -O /usr/local/bin/dcgm-exporter https://raw.githubusercontent.com/NVIDIA/gpu-monitoring-tools/master/exporters/prometheus-dcgm/dcgm-exporter/dcgm-exporter
chmod +x /usr/local/bin/dcgm-exporter
mkdir /run/prometheus
wget -q -O /etc/systemd/system/prometheus-dcgm.service https://raw.githubusercontent.com/NVIDIA/gpu-monitoring-tools/master/exporters/prometheus-dcgm/bare-metal/prometheus-dcgm.service
systemctl daemon-reload
systemctl enable --now prometheus-dcgm.service
3. 从 https://prometheus.io/download/#node_exporter 下载 node_exporter,手工安装为服务并添加 dcgm-exporter 资料
tar xf node_exporter*.tar.gz
mv node_exporter-*/node_exporter /usr/local/bin/
chown root:root /usr/local/bin/node_exporter
chmod +x /usr/local/bin/node_exporter
cat > /etc/systemd/system/node_exporter.service <<EOF
[Unit]
Description=Prometheus Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
EOF
sed -i '/ExecStart=\/usr\/local\/bin\/node_exporter/c\ExecStart=\/usr\/local\/bin\/node_exporter --collector.textfile.directory=\/run\/prometheus' /etc/systemd/system/node_exporter.service
systemctl daemon-reload
systemctl enable --now node_exporter.service
4. Grafana 添加这个Dashboard
https://grafana.com/grafana/dashboards/11752
在nginx的 docker run 中添加webroot和配置文件挂载
-v $PWD/nginx/letsencrypt/:/var/www/letsencrypt:ro \
-v $PWD/letsencrypt/etc/:/etc/nginx/letsencrypt/:ro \
在nginx中将wwwroot发布出去
location ^~ /.well-known/ {
root /var/www/letsencrypt/;
}
在nginx中配置证书文件
ssl_certificate letsencrypt/live/www.yaoge123.com/fullchain.pem;
ssl_certificate_key letsencrypt/live/www.yaoge123.com/privkey.pem;
创建 certbot 的docker run脚本,以后只要周期性运行这个脚本就可以自动更新证书了
#!/bin/sh
cd $(dirname $0)
pwd
docker run -it --rm \
-v $PWD/letsencrypt/etc:/etc/letsencrypt \
-v $PWD/letsencrypt/lib:/var/lib/letsencrypt \
-v $PWD/letsencrypt/log:/var/log/letsencrypt \
-v $PWD/nginx/letsencrypt:/var/www \
certbot/certbot \
certonly --webroot \
--email yaoge123@example.com --agree-tos --no-eff-email \
--webroot-path=/var/www/ \
-n \
--domains www.yaoge123.com
docker kill --signal=HUP nginx
先添加EPEL再用yum安装cacti和中文字体
yum install cacti cacti-spine mariadb-server google-noto-sans-simplified-chinese-fonts
编辑 /etc/httpd/conf.d/cacti.conf ,在 Directory /usr/share/cacti/ 中添加可访问的浏览器客户端
编辑 /etc/cron.d/cacti ,去掉注释
编辑 /etc/spine.conf ,注释RDB_*
创建数据库
[root@yaoge123]# mysqladmin --user=root create cacti
创建数据库用户
[root@yaoge123]# mysql --user=root mysql
MariaDB [mysql]> GRANT ALL ON cacti.* TO cactiuser@localhost IDENTIFIED BY 'cactiuser';
MariaDB [mysql]> flush privileges;
数据库用户增加 timezone 权限
[root@yaoge123]# mysql -u root
MariaDB [(none)]> GRANT SELECT ON mysql.time_zone_name TO cactiuser@localhost IDENTIFIED BY 'cactiuser';
MariaDB [(none)]> flush privileges;
数据库增加 timezone
[root@yaoge123]# mysql_tzinfo_to_sql /usr/share/zoneinfo/ | mysql -u root mysql
新建一个文件 /etc/my.cnf.d/cacti.cnf ,内容供参考根据实际情况修改
[mysqld]
character-set-client = utf8mb4
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
innodb_additional_mem_pool_size = 80M
innodb_buffer_pool_size = 1024M
innodb_doublewrite = ON
innodb_file_format = Barracuda
innodb_file_per_table = ON
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_large_prefix = ON
join_buffer_size = 748M
max_allowed_packet = 16777216
max_heap_table_size = 374M
tmp_table_size = 374M
重启相关服务,设置开机自动启动
systemctl restart mariadb
systemctl enable mariadb
systemctl restart httpd
systemctl enable httpd
导入数据库
[root@yaoge123]# mysql cacti < /usr/share/doc/cacti-*/cacti.sql
浏览器打开 http://<server>/cacti/ ,默认用户名密码为 admin/admin
2*Intel(R) Xeon(R) Gold 5122 CPU @ 3.60GHz
12*HPE SmartMemory DDR4-2666 RDIMM 16GiB
iLO 5 1.37 Oct 25 2018
System ROM U30 v1.46 (10/02/2018)
Intelligent Platform Abstraction Data 7.2.0 Build 30
System Programmable Logic Device 0x2A
Power Management Controller Firmware 1.0.4
NVMe Backplane Firmware 1.20
Power Supply Firmware 1.00
Power Supply Firmware 1.00
Innovation Engine (IE) Firmware 0.1.6.1
Server Platform Services (SPS) Firmware 4.0.4.288
Redundant System ROM U30 v1.42 (06/20/2018)
Intelligent Provisioning 3.20.154
Power Management Controller FW Bootloader 1.1
HPE Smart Storage Battery 1 Firmware 0.60
HPE Eth 10/25Gb 2p 631FLR-SFP28 Adptr 212.0.103001
HPE Ethernet 1Gb 4-port 331i Adapter – NIC 20.12.41
HPE Smart Array P816i-a SR Gen10 1.65
HPE 100Gb 1p OP101 QSFP28 x16 OPA Adptr 1.5.2.0.0
HPE InfiniBand EDR/Ethernet 100Gb 2-port 840QSF 12.22.40.30
Embedded Video Controller 2.5
CentOS Linux release 7.6.1810 (Core)
Linux yaoge123 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Intel(R) Memory Latency Checker – v3.6
升级到OpenLDAP 2.4.44,出现以下错误
User Schema load failed for attribute "pwdMaxRecordedFailure". Error code 17: attribute type undefined config error processing olcOverlay={1}ppolicy,olcDatabase={2}hdb,cn=config: User Schema load failed for attribute "pwdMaxRecordedFailure". Erro...ype undefined slapd stopped.
解决办法
cd /etc/openldap/slapd.d/cn=config/cn=schema mv cn\=\{3\}ppolicy.ldif cn\=\{3\}ppolicy.ldif.bak mv /etc/openldap/schema/ppolicy.ldif cn\=\{3\}ppolicy.ldif
OpenLDAP默认是没有密码检查策略的,123456这也得密码也能接受,这显然是管理员不希望看到的。
ldapadd -Y EXTERNAL -H ldapi:/// -D "cn=config" -f /etc/openldap/schema/ppolicy.ldif
dn: cn=module{0},cn=config changetype: modify add: olcModuleLoad olcModuleLoad: ppolicy.la
ldapmodify -Y EXTERNAL -H ldapi:/// -f mod_ppolicy.ldif
dn: olcOverlay=ppolicy,olcDatabase={2}hdb,cn=config changeType: add objectClass: olcOverlayConfig objectClass: olcPPolicyConfig olcOverlay: ppolicy olcPPolicyDefault: cn=default,ou=ppolicy,dc=yaoge123,dc=com olcPPolicyHashCleartext: TRUE
ldapmodify -Y EXTERNAL -H ldapi:/// -f ppolicy.ldif
dn: ou=ppolicy,dc=yaoge123,dc=com objectClass: organizationalUnit objectClass: top ou: ppolicy dn: cn=default,ou=ppolicy,dc=yaoge123,dc=com cn: default objectClass: top objectClass: device objectClass: pwdPolicy objectClass: pwdPolicyChecker pwdAllowUserChange: TRUE pwdAttribute: userPassword pwdCheckQuality: 2 pwdExpireWarning: 604800 pwdFailureCountInterval: 0 pwdGraceAuthnLimit: 5 pwdInHistory: 5 pwdLockout: TRUE pwdLockoutDuration: 600 pwdMaxAge: 0 pwdMaxFailure: 5 pwdMinAge: 0 pwdMinLength: 8 pwdMustChange: FALSE pwdSafeModify: FALSE pwdCheckModule: check_password.so
ldapadd -Y EXTERNAL -H ldapi:/// -f defaultppolicy.ldif
目标是ldap1和ldap2做成高可用LDAP为集群中所有节点提供身份验证服务。
一、LDAP服务端,ldap1和ldap2均需安装配置
安装OpenLDAP并导入基本定义
yum install -y openldap openldap-clients openldap-servers cp /usr/share/openldap-servers/DB_CONFIG.example /var/lib/ldap/DB_CONFIG chown -R ldap:ldap /var/lib/ldap systemctl enable slapd.service systemctl start slapd.service ldapadd -Y EXTERNAL -H ldapi:/// -D "cn=config" -f /etc/openldap/schema/cosine.ldif ldapadd -Y EXTERNAL -H ldapi:/// -D "cn=config" -f /etc/openldap/schema/nis.ldif
修改LDAP基本配置
dn: olcDatabase={2}hdb,cn=config changetype: modify replace: olcSuffix olcSuffix: dc=yaoge123,dc=com dn: olcDatabase={2}hdb,cn=config changetype: modify replace: olcRootDN olcRootDN: cn=Manager,dc=yaoge123,dc=com dn: olcDatabase={2}hdb,cn=config changetype: modify replace: olcRootPW olcRootPW: {SSHA}lY3iu244B87mEjUzSyHboD3x0tjTRHCV dn: cn=config changetype: modify replace: olcLogLevel olcLogLevel: stats2 shell sync dn: olcDatabase={1}monitor,cn=config changetype: modify replace: olcAccess olcAccess: {0}to * by dn.base="gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth" read by dn.base="cn=Manager,dc=yaoge123,dc=com" read by * none
ldapmodify -Y EXTERNAL -H ldapi:/// -f db.ldif
生成证书并设置TLS
openssl req -new -x509 -nodes -out ca-cert.pem -keyout ca-key.pem -days 7305
openssl req -new -nodes -out cert.csr -keyout key.pem
openssl x509 -req -in cert.csr -CAkey ca-key.pem -CA ca-cert.pem -out cert.pem -set_serial 01 -days 7305
mv ca-cert.pem cert.pem key.pem /etc/openldap/certs/ chown -R ldap:ldap /etc/openldap/certs/{ca-cert,cert,key}.pem chmod 644 /etc/openldap/certs/ca-cert.pem chmod 644 /etc/openldap/certs/cert.pem chmod 600 /etc/openldap/certs/key.pem
dn: cn=config changetype: modify replace: olcTLSCACertificateFile olcTLSCACertificateFile: /etc/openldap/certs/ca-cert.pem dn: cn=config changetype: modify replace: olcTLSCertificateFile olcTLSCertificateFile: /etc/openldap/certs/cert.pem dn: cn=config changetype: modify replace: olcTLSCertificateKeyFile olcTLSCertificateKeyFile: /etc/openldap/certs/key.pem
ldapmodify -Y EXTERNAL -H ldapi:/// -f certs.ldif
SLAPD_URLS="ldapi:/// ldaps:///"
systemctl restart slapd.service
openssl s_client -connect ldap1:636 -showcerts -state -CAfile /etc/openldap/certs/ca-cert.pem
openssl req -in cert.csr -noout -text //查看证书请求文件 openssl x509 -in cert.pem -noout -text //查看证书
创建自己的域
dn: dc=yaoge123,dc=com dc: yaoge123 objectClass: top objectClass: domain dn: ou=People,dc=yaoge123,dc=com ou: People objectClass: top objectClass: organizationalUnit dn: ou=Group,dc=yaoge123,dc=com ou: Group objectClass: top objectClass: organizationalUnit
ldapadd -x -W -D cn=Manager,dc=yaoge123,dc=com -H ldapi:/// -f base.ldif
LDAP复制
dn: cn=module,cn=config objectClass: olcModuleList cn: module olcModulePath: /usr/lib64/openldap olcModuleLoad: syncprov.la
ldapadd -Y EXTERNAL -H ldapi:/// -f mod_syncprov.ldif
dn: olcDatabase={2}hdb,cn=config changetype: modify add: olcDbIndex olcDbIndex: entryCSN,entryUUID eq dn: olcOverlay=syncprov,olcDatabase={2}hdb,cn=config changeType: add objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcOverlay: syncprov olcSpCheckpoint: 100 10 olcSpSessionLog: 100
ldapmodify -Y EXTERNAL -H ldapi:/// -f syncprov.ldif
dn: cn=ldapreader,dc=yaoge123,dc=com objectClass: simpleSecurityObject objectClass: organizationalRole cn: ldapreader description: LDAP reader user userPassword: {SSHA}95M+f4bXaOF4DwJ5HdMY75kkqNXEFJRU
ldapadd -x -W -D cn=Manager,dc=yaoge123,dc=com -H ldapi:/// -f syncuser.ldif
dn: cn=config changeType: modify add: olcServerID olcServerID: 1 dn: olcDatabase={2}hdb,cn=config changeType: modify add: olcSyncrepl olcSyncrepl: rid=001 provider=ldaps://ldap2 bindmethod=simple binddn="cn=ldapreader,dc=yaoge123,dc=com" credentials=yaoge123 searchbase="dc=yaoge123,dc=com" schemachecking=on type=refreshAndPersist retry="60 +" tls_cacert=/etc/openldap/certs/ca-cert.pem - add: olcMirrorMode olcMirrorMode: TRUE
ldapmodify -Y EXTERNAL -H ldapi:/// -f ldap1sync.ldif
dn: cn=config changeType: modify add: olcServerID olcServerID: 2 dn: olcDatabase={2}hdb,cn=config changeType: modify add: olcSyncrepl olcSyncrepl: rid=001 provider=ldaps://ldap1 bindmethod=simple binddn="cn=ldapreader,dc=yaoge123,dc=com" credentials=yaoge123 searchbase="dc=yaoge123,dc=com" schemachecking=on type=refreshAndPersist retry="60 +" tls_cacert=/etc/openldap/certs/ca-cert.pem - add: olcMirrorMode olcMirrorMode: TRUE
ldapmodify -Y EXTERNAL -H ldapi:/// -f ldap2sync.ldif
/usr/sbin/slapd -u ldap -g ldap -h "ldapi:// ldaps://" -d -1
迁移已有用户
yum install -y migrationtools
# Default DNS domain $DEFAULT_MAIL_DOMAIN = "yaoge123.com"; # Default base $DEFAULT_BASE = "dc=yaoge123,dc=com";
grep ":10[0-9][0-9]" /etc/passwd > passwd grep ":10[0-9][0-9]" /etc/group > group /usr/share/migrationtools/migrate_passwd.pl passwd users.ldif /usr/share/migrationtools/migrate_group.pl group groups.ldif
ldapadd -x -W -D "cn=Manager,dc=yaoge123,dc=com" -H ldapi:/// -f users.ldif ldapadd -x -W -D "cn=Manager,dc=yaoge123,dc=com" -H ldapi:/// -f groups.ldif
dn: olcDatabase={2}hdb,cn=config changetype: modify add: olcDbIndex olcDbIndex: uid,uidNumber,gidNumber,member,memberUid eq
ldapmodify -Y EXTERNAL -H ldapi:/// -f index.ldif
修改LDAP ACL:
dn: olcDatabase={2}hdb,cn=config changetype: modify replace: olcAccess olcAccess: {0}to dn.children="dc=nnlmhpcc" attrs=userPassword,shadowLastChange by dn="cn=Manager,dc=nnlmhpcc" manage by dn.base="gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth" manage by dn="cn=ldapreader,dc=nnlmhpcc" read by self write by * auth olcAccess: {1}to * by dn="cn=Manager,dc=nnlmhpcc" manage by dn.base="gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth" manage by * read
ldapmodify -Y EXTERNAL -H ldapi:/// -f access.ldif
dn: cn=config changetype: modify replace: olcLogLevel olcLogLevel: acl stats stats2 shell sync
LDAP认证的客户端:
yum install -y nss-pam-ldapd
authconfig --enableldap --enableldapauth --ldapserver="ldaps://ldap1,ldaps://ldap2" --ldapbasedn="dc=yaoge123,dc=com" --disableldaptls --ldaploadcacert=http://www.yaoge123.com/ca-cert.pem --updateall
对于虚拟化部署的ldap1和ldap2,需要添加规则让两个虚机不在同一个主机上运行。
集群节点较多时,slapd会报错Too many open files。参考http://smilejay.com/2016/06/centos-7-systemd-conf-limits/和http://www.cnblogs.com/chris-cp/p/6667753.html,修改slapd的Max open files限制, 查看限制:
grep files /proc/`pidof slapd`/limits
新机器安装系统,配置yun源、Hostname、Timezone、resolv.conf、路由和每个网卡的IP,yum upgrade,禁用SELinux和Firewall,hosts中配置本机主机名和ip,如公网网卡采取DHCP 配置中需添加PEERDNS=no 和 IPV6_PEERDNS=no
VM虚拟机中安装vmware tools,vmware-toolbox-cmd timesync status确认时间同步是否启用,vmware-toolbox-cmd timesync enable启用虚拟机和主机的时间同步
yum install rsync net-snmp-utils
重启后使用go-xcat install 自动化安装xcat
tabedit site:添加修改dhcpinterfaces、managedaddressmode、domain、master、dnsinterfaces、extntpservers;确认forwarders、nameservers。
tabedit networks:确认修改mgtifname、gateway、dhcpserver、tftpserver、ntpservers
修改/etc/resolv.conf,search为site表中的domain,nameserver为xcat自身
makedns,测试dns是否正常
修改/etc/chrony.conf,测试ntp是否正常
修改/etc/exports,限定IP地址范围
修改/etc/httpd/conf/httpd.conf,限定只监听内网的80端口
修改/etc/logrotate.conf,满足合规性要求,增加日志保留时间并启用压缩
迁移/etc/hosts.deny和hosts.all,配置只允许指定IP进行远程登录
修改/etc/postfix/main.cf中的myhostname和inet_interfaces
拷贝旧机器/install下的os image、自定义脚本等到新机器下
tabedit passwd:添加system的用户名密码,密码可以用openssl passwd -1加密
旧机器导出xCAT数据库dumpxCATdb -p /tmp/db,至少将nodelist chain bootparams nodetype mac hosts postscripts noderes nodehm osimage linuximage osdistro ipmi mp mpa等自定义表在新机器上restorexCATdb -p导入
如果需要迁移eventlog和auditlog,导入导出需添加-a参数,auditlog因为比较大导入非常慢
迁移/var/log下的日志
替换root ssh key为新的
迁移/etc/cron.d下的自定义定时任务
主机IP必须通过DHCP获得,但是因故需要重新指定网关并做策略路由。例如主机DHCP获取IP段192.168.1.0/24,DHCP获取网关192.168.1.1,拟将默认路由改为192.168.1.2,本地IP仍然走网关192.168.1.1
RHEL(CentOS) 6/7
/etc/sysconfig/network-scripts/ifcfg-eth0 BOOTPROTO=dhcp NM_CONTROLLED="no" ONBOOT=yes GATEWAY=192.168.1.2 …… /etc/sysconfig/network-scripts/route-eth0 192.168.0.0/16 via 192.168.1.1 /etc/sysconfig/network NETWORKING=yes ……
Debian 7
/etc/network/interfaces …… up route del default dev eth0 up route add -net 192.168.1.0 netmask 255.255.255.0 gw 192.168.1.1 dev eth0 up route add default gw 192.168.1.2 dev eth0 ……
Suse 11
/etc/sysconfig/network/routes …… 192.168.0.0 192.168.1.1 255.255.0.0 eth0 default 192.168.1.2 - -