硬件环境
2*Intel(R) Xeon(R) Gold 5122 CPU @ 3.60GHz
12*HPE SmartMemory DDR4-2666 RDIMM 16GiB
iLO 5 1.37 Oct 25 2018
System ROM U30 v1.46 (10/02/2018)
Intelligent Platform Abstraction Data 7.2.0 Build 30
System Programmable Logic Device 0x2A
Power Management Controller Firmware 1.0.4
NVMe Backplane Firmware 1.20
Power Supply Firmware 1.00
Power Supply Firmware 1.00
Innovation Engine (IE) Firmware 0.1.6.1
Server Platform Services (SPS) Firmware 4.0.4.288
Redundant System ROM U30 v1.42 (06/20/2018)
Intelligent Provisioning 3.20.154
Power Management Controller FW Bootloader 1.1
HPE Smart Storage Battery 1 Firmware 0.60
HPE Eth 10/25Gb 2p 631FLR-SFP28 Adptr 212.0.103001
HPE Ethernet 1Gb 4-port 331i Adapter – NIC 20.12.41
HPE Smart Array P816i-a SR Gen10 1.65
HPE 100Gb 1p OP101 QSFP28 x16 OPA Adptr 1.5.2.0.0
HPE InfiniBand EDR/Ethernet 100Gb 2-port 840QSF 12.22.40.30
Embedded Video Controller 2.5
软件环境
CentOS Linux release 7.6.1810 (Core)
Linux yaoge123 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Intel(R) Memory Latency Checker – v3.6
测试命令
./mlc_avx512;./mlc_avx512 -Y;./mlc_avx512 -Z
只取后两次测试结果
结果
Advanced ECC
Workload Profile: General Power Efficient Compute
Intel (R) Hyperthreading Options: Disabled
Enabled Cores per Processor: 0
Processor x2APIC Support: Disabled
Intel(R) Virtualization Technology (Intel VT): Disabled
Intel (R) VT-d: Disabled
SR-IOV: Disabled
Advanced Memory Protection: Advanced ECC
Power Regulator: Dynamic Power Savings Mode
total used free shared buff/cache available
Mem: 188G 1.5G 186G 9.7M 222M 186G
Swap: 4.0G 0B 4.0G
./mlc_avx512 -Y
Intel(R) Memory Latency Checker - v3.6
Measuring idle latencies (in ns)...
Numa node
Numa node 0 1
0 81.7 136.6
1 133.1 76.9
Measuring Peak Injection Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads : 99954.8
3:1 Reads-Writes : 145189.0
2:1 Reads-Writes : 151137.4
1:1 Reads-Writes : 150854.9
Stream-triad like: 101469.8
Measuring Memory Bandwidths between nodes within system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Numa node
Numa node 0 1
0 49976.5 31093.5
1 30873.7 50050.0
Measuring Loaded Latencies for the system
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Inject Latency Bandwidth
Delay (ns) MB/sec
==========================
00000 88.68 89235.1
00002 88.73 89335.3
00008 88.65 89217.6
00015 88.76 88243.5
00050 89.40 85207.4
00100 84.77 66373.2
00200 83.76 45796.8
00300 83.08 35407.3
00400 82.67 28352.7
00500 82.29 23883.3
00700 81.76 17950.3
01000 81.61 13123.4
01300 81.48 10414.7
01700 81.54 8249.4
02500 81.43 5930.9
03500 81.43 4481.2
05000 81.47 3384.8
09000 81.63 2226.1
20000 81.44 1440.1
Measuring cache-to-cache transfer latency (in ns)...
Local Socket L2->L2 HIT latency 47.4
Local Socket L2->L2 HITM latency 47.5
Remote Socket L2->L2 HITM latency (data address homed in writer socket)
Reader Numa Node
Writer Numa Node 0 1
0 - 112.5
1 114.0 -
Remote Socket L2->L2 HITM latency (data address homed in reader socket)
Reader Numa Node
Writer Numa Node 0 1
0 - 178.3
1 180.4 -
./mlc_avx512 -Z
Intel(R) Memory Latency Checker - v3.6
Measuring idle latencies (in ns)...
Numa node
Numa node 0 1
0 81.6 137.7
1 133.3 77.5
Measuring Peak Injection Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads : 104428.9
3:1 Reads-Writes : 138213.4
2:1 Reads-Writes : 148680.5
1:1 Reads-Writes : 148760.9
Stream-triad like: 97657.5
Measuring Memory Bandwidths between nodes within system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Numa node
Numa node 0 1
0 52281.3 31156.8
1 31071.8 51930.0
Measuring Loaded Latencies for the system
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Inject Latency Bandwidth
Delay (ns) MB/sec
==========================
00000 88.14 92987.7
00002 88.07 92861.1
00008 86.46 87868.3
00015 87.76 87068.6
00050 89.06 84397.9
00100 84.80 65169.7
00200 83.98 45178.1
00300 83.05 34780.6
00400 82.47 27854.5
00500 82.08 23249.2
00700 81.80 17532.0
01000 81.58 12815.7
01300 81.51 10190.1
01700 81.51 8062.1
02500 81.35 5794.3
03500 81.40 4386.5
05000 81.29 3314.9
09000 81.06 2193.8
20000 80.98 1427.2
Measuring cache-to-cache transfer latency (in ns)...
Local Socket L2->L2 HIT latency 47.4
Local Socket L2->L2 HITM latency 47.4
Remote Socket L2->L2 HITM latency (data address homed in writer socket)
Reader Numa Node
Writer Numa Node 0 1
0 - 112.5
1 114.1 -
Remote Socket L2->L2 HITM latency (data address homed in reader socket)
Reader Numa Node
Writer Numa Node 0 1
0 - 178.7
1 180.7 -
Online Spare Memory with Advanced ECC Support
Workload Profile: General Power Efficient Compute
Intel (R) Hyperthreading Options: Disabled
Enabled Cores per Processor: 0
Processor x2APIC Support: Disabled
Intel(R) Virtualization Technology (Intel VT): Disabled
Intel (R) VT-d: Disabled
SR-IOV: Disabled
Advanced Memory Protection: Online Spare Memory with Advanced ECC Support
Power Regulator: Dynamic Power Savings Mode
total used free shared buff/cache available
Mem: 93G 998M 92G 9.7M 169M 92G
Swap: 4.0G 0B 4.0G
./mlc_avx512 -Y
Intel(R) Memory Latency Checker - v3.6
Measuring idle latencies (in ns)...
Numa node
Numa node 0 1
0 81.5 137.3
1 132.9 81.6
Measuring Peak Injection Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads : 101550.7
3:1 Reads-Writes : 122423.5
2:1 Reads-Writes : 123643.4
1:1 Reads-Writes : 123946.9
Stream-triad like: 100338.1
Measuring Memory Bandwidths between nodes within system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Numa node
Numa node 0 1
0 51795.6 30250.4
1 30170.0 50789.5
Measuring Loaded Latencies for the system
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Inject Latency Bandwidth
Delay (ns) MB/sec
==========================
00000 93.47 91710.7
00002 93.39 91768.5
00008 93.48 91771.5
00015 93.49 91460.7
00050 93.60 86147.5
00100 88.45 66769.2
00200 85.89 45695.5
00300 85.45 35209.2
00400 85.12 28174.0
00500 84.18 23784.2
00700 82.89 17896.0
01000 82.07 13104.8
01300 81.90 10377.9
01700 81.71 8247.6
02500 81.66 5928.2
03500 81.65 4451.3
05000 81.60 3383.5
09000 81.45 2228.4
20000 81.19 1443.0
Measuring cache-to-cache transfer latency (in ns)...
Local Socket L2->L2 HIT latency 47.4
Local Socket L2->L2 HITM latency 47.4
Remote Socket L2->L2 HITM latency (data address homed in writer socket)
Reader Numa Node
Writer Numa Node 0 1
0 - 112.6
1 114.1 -
Remote Socket L2->L2 HITM latency (data address homed in reader socket)
Reader Numa Node
Writer Numa Node 0 1
0 - 179.0
1 178.1 -
./mlc_avx512 -Z
Intel(R) Memory Latency Checker - v3.6
Measuring idle latencies (in ns)...
Numa node
Numa node 0 1
0 81.7 136.3
1 132.6 75.8
Measuring Peak Injection Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads : 103224.0
3:1 Reads-Writes : 117884.1
2:1 Reads-Writes : 123197.6
1:1 Reads-Writes : 122909.9
Stream-triad like: 95929.5
Measuring Memory Bandwidths between nodes within system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Numa node
Numa node 0 1
0 51349.0 30597.5
1 30340.8 51559.5
Measuring Loaded Latencies for the system
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Inject Latency Bandwidth
Delay (ns) MB/sec
==========================
00000 92.20 91895.3
00002 92.26 91902.1
00008 91.58 87681.6
00015 92.70 88728.1
00050 91.73 83071.0
00100 87.27 64608.4
00200 87.20 44479.3
00300 85.37 34295.7
00400 85.13 27746.6
00500 83.84 22966.0
00700 82.59 17352.7
01000 81.45 12820.3
01300 80.62 10185.6
01700 80.86 8072.0
02500 80.20 5805.8
03500 79.78 4404.5
05000 79.43 3335.5
09000 78.53 2219.9
20000 77.91 1458.6
Measuring cache-to-cache transfer latency (in ns)...
Local Socket L2->L2 HIT latency 47.4
Local Socket L2->L2 HITM latency 47.5
Remote Socket L2->L2 HITM latency (data address homed in writer socket)
Reader Numa Node
Writer Numa Node 0 1
0 - 112.6
1 114.1 -
Remote Socket L2->L2 HITM latency (data address homed in reader socket)
Reader Numa Node
Writer Numa Node 0 1
0 - 176.1
1 178.1 -
Static High Performance Mode
Workload Profile: High Performance Compute (HPC)
Intel (R) Hyperthreading Options: Disabled
Enabled Cores per Processor: 0
Processor x2APIC Support: Disabled
Intel(R) Virtualization Technology (Intel VT): Disabled
Intel (R) VT-d: Disabled
SR-IOV: Disabled
Advanced Memory Protection: Advanced ECC
Power Regulator: Static High Performance Mode
NUMA Group Size Optimization: Clustered
Sub-NUMA Clustering: Disabled
total used free shared buff/cache available
Mem: 188G 1.5G 186G 9.7M 174M 186G
Swap: 4.0G 0B 4.0G
./mlc_avx512 -Y
Intel(R) Memory Latency Checker - v3.6
Measuring idle latencies (in ns)...
Numa node
Numa node 0 1
0 81.5 136.6
1 133.0 80.8
Measuring Peak Injection Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads : 99671.7
3:1 Reads-Writes : 144913.0
2:1 Reads-Writes : 151828.4
1:1 Reads-Writes : 150604.2
Stream-triad like: 101510.8
Measuring Memory Bandwidths between nodes within system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Numa node
Numa node 0 1
0 50065.0 30928.7
1 30822.4 49750.0
Measuring Loaded Latencies for the system
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Inject Latency Bandwidth
Delay (ns) MB/sec
==========================
00000 88.36 88426.0
00002 88.28 88513.4
00008 88.39 88376.7
00015 88.86 87479.0
00050 91.10 84842.5
00100 85.64 65520.4
00200 84.52 45690.0
00300 84.02 35335.0
00400 82.86 28333.1
00500 82.59 23871.6
00700 81.90 17939.0
01000 81.59 13122.1
01300 81.42 10415.4
01700 81.48 8250.4
02500 81.36 5931.9
03500 81.35 4482.6
05000 81.22 3387.9
09000 81.06 2233.0
20000 80.91 1445.8
Measuring cache-to-cache transfer latency (in ns)...
Local Socket L2->L2 HIT latency 47.2
Local Socket L2->L2 HITM latency 47.3
Remote Socket L2->L2 HITM latency (data address homed in writer socket)
Reader Numa Node
Writer Numa Node 0 1
0 - 112.6
1 113.9 -
Remote Socket L2->L2 HITM latency (data address homed in reader socket)
Reader Numa Node
Writer Numa Node 0 1
0 - 176.2
1 177.6 -
./mlc_avx512 -Z
Intel(R) Memory Latency Checker - v3.6
Measuring idle latencies (in ns)...
Numa node
Numa node 0 1
0 81.5 136.6
1 133.0 80.8
Measuring Peak Injection Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads : 103610.7
3:1 Reads-Writes : 138387.8
2:1 Reads-Writes : 149094.5
1:1 Reads-Writes : 149883.0
Stream-triad like: 97603.1
Measuring Memory Bandwidths between nodes within system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Numa node
Numa node 0 1
0 51908.0 31130.9
1 31021.3 51681.4
Measuring Loaded Latencies for the system
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Inject Latency Bandwidth
Delay (ns) MB/sec
==========================
00000 87.97 92148.1
00002 87.94 91946.7
00008 86.79 87489.0
00015 88.32 87027.6
00050 90.93 84764.9
00100 85.72 66075.6
00200 84.60 45281.3
00300 83.64 34671.7
00400 83.08 27878.4
00500 82.57 23317.8
00700 81.79 17534.7
01000 81.58 12821.8
01300 81.56 10186.6
01700 81.61 8061.2
02500 81.48 5795.0
03500 81.40 4387.2
05000 81.27 3317.6
09000 81.06 2194.2
20000 80.94 1427.7
Measuring cache-to-cache transfer latency (in ns)...
Local Socket L2->L2 HIT latency 47.2
Local Socket L2->L2 HITM latency 47.3
Remote Socket L2->L2 HITM latency (data address homed in writer socket)
Reader Numa Node
Writer Numa Node 0 1
0 - 112.6
1 113.9 -
Remote Socket L2->L2 HITM latency (data address homed in reader socket)
Reader Numa Node
Writer Numa Node 0 1
0 - 176.5
1 178.1 -
Static High Performance Mode – Sub-NUMA
Workload Profile: High Performance Compute (HPC)
Intel (R) Hyperthreading Options: Disabled
Enabled Cores per Processor: 0
Processor x2APIC Support: Disabled
Intel(R) Virtualization Technology (Intel VT): Disabled
Intel (R) VT-d: Disabled
SR-IOV: Disabled
Advanced Memory Protection: Advanced ECC
Power Regulator: Static High Performance Mode
NUMA Group Size Optimization: Clustered
Sub-NUMA Clustering: Enabled
total used free shared buff/cache available
Mem: 188G 1.5G 186G 9.7M 176M 186G
Swap: 4.0G 0B 4.0G
./mlc_avx512 -Y
Intel(R) Memory Latency Checker - v3.6
Measuring idle latencies (in ns)...
Numa node
Numa node 0 1 2 3
0 73.2 81.9 131.4 139.6
1 81.4 73.2 132.8 140.7
2 128.6 138.5 73.1 81.4
3 132.4 140.6 81.4 80.0
Measuring Peak Injection Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads : 113325.1
3:1 Reads-Writes : 153852.5
2:1 Reads-Writes : 158538.8
1:1 Reads-Writes : 163349.2
Stream-triad like: 106597.2
Measuring Memory Bandwidths between nodes within system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Numa node
Numa node 0 1 2 3
0 28097.4 26815.3 18072.4 17292.1
1 27017.8 28428.9 18125.0 17314.9
2 18003.6 17296.8 28663.1 26939.6
3 17851.0 17180.2 26850.6 28090.9
Measuring Loaded Latencies for the system
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Inject Latency Bandwidth
Delay (ns) MB/sec
==========================
00000 77.56 100595.6
00002 77.57 100674.4
00008 77.55 100718.8
00015 77.55 99986.2
00050 77.43 92388.9
00100 75.33 76544.3
00200 74.09 45915.4
00300 74.25 36109.2
00400 74.10 28748.0
00500 74.02 24010.9
00700 73.74 18067.8
01000 73.61 13237.0
01300 73.53 10535.6
01700 73.40 8344.6
02500 73.25 6023.4
03500 73.16 4569.0
05000 73.13 3474.4
09000 73.11 2318.4
20000 73.07 1530.5
Measuring cache-to-cache transfer latency (in ns)...
Local Socket L2->L2 HIT latency 45.5
Local Socket L2->L2 HITM latency 45.6
Remote Socket L2->L2 HITM latency (data address homed in writer socket)
Reader Numa Node
Writer Numa Node 0 1 2 3
0 - 47.9 108.0 110.0
1 47.5 - 114.1 116.1
2 108.4 109.3 - 45.4
3 114.7 115.4 45.6 -
Remote Socket L2->L2 HITM latency (data address homed in reader socket)
Reader Numa Node
Writer Numa Node 0 1 2 3
0 - 47.3 169.9 177.3
1 45.6 - 168.8 176.3
2 173.2 177.3 - 48.4
3 173.7 177.7 48.6 -
./mlc_avx512 -Z
Intel(R) Memory Latency Checker - v3.6
Measuring idle latencies (in ns)...
Numa node
Numa node 0 1 2 3
0 73.2 81.6 131.0 139.6
1 81.4 73.2 132.8 140.7
2 128.6 138.5 73.1 81.4
3 132.4 140.6 81.4 80.0
Measuring Peak Injection Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads : 116063.6
3:1 Reads-Writes : 146797.3
2:1 Reads-Writes : 155021.3
1:1 Reads-Writes : 160334.0
Stream-triad like: 102302.4
Measuring Memory Bandwidths between nodes within system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Numa node
Numa node 0 1 2 3
0 28901.7 27470.7 17819.7 17063.9
1 27885.7 29067.0 17830.9 17038.7
2 17677.7 16976.0 29378.9 27692.9
3 17582.0 16918.8 27642.7 28928.8
Measuring Loaded Latencies for the system
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Inject Latency Bandwidth
Delay (ns) MB/sec
==========================
00000 76.67 103298.6
00002 76.67 103133.2
00008 76.59 97928.7
00015 77.14 99307.7
00050 77.64 91830.6
00100 75.32 76645.1
00200 74.28 45323.2
00300 74.23 35625.3
00400 74.10 28168.6
00500 74.55 23432.3
00700 73.77 17619.9
01000 73.59 12901.8
01300 73.53 10270.0
01700 73.48 8151.9
02500 73.26 5880.3
03500 73.16 4477.2
05000 73.12 3403.2
09000 73.10 2279.6
20000 73.09 1512.5
Measuring cache-to-cache transfer latency (in ns)...
Local Socket L2->L2 HIT latency 45.5
Local Socket L2->L2 HITM latency 45.6
Remote Socket L2->L2 HITM latency (data address homed in writer socket)
Reader Numa Node
Writer Numa Node 0 1 2 3
0 - 47.9 108.0 109.9
1 47.4 - 114.1 116.1
2 108.4 109.3 - 45.4
3 114.7 115.4 45.6 -
Remote Socket L2->L2 HITM latency (data address homed in reader socket)
Reader Numa Node
Writer Numa Node 0 1 2 3
0 - 47.3 169.9 177.4
1 45.6 - 168.8 176.3
2 173.3 177.3 - 48.4
3 173.7 177.7 48.6 -