Lab notes on 10 Gbit/s network tests
(C) 2007 Jan Wagner, Guifre Molera
27 June 2008 - SATA Port Multiplier (PM) tests.
A single 750GB Spin Point F1 disk behind the port multiplier and the ADSA3GPX8-4E PCI-Express SATA controller, give an hdparm performance of 72 MB/s.
| SATA port | PCI-Express | native |
| hdparm single disk | 72 MB/S | 72.4 MB/s |
| wr-nexgen single disk | 611 Mb/s | 611 Mb/s |
| wr-nexgen multiple disks | 1620 Mb/s | 2337 Mb/s |
| hdparm RAID | 232 MB/s | 286 MB/s |
| wr-nexgen RAID | 1818 Mb/s | 2341 Mb/s |
01 July 2008 - SATA controller tests
By default the SATA native ports cannot detect the port multiplier, probably updating bios or installing the latest sata_nv drivers might help. So the tests continued by using the SATA host controller, driver sata_sil24. Detection is automatic after booting the pc. All data from new Samsung 750 GB has been erased so we could run same tests by using them. Past tests showed much difference between old and new disks connected to the native SATA ports.
| Mode | PCI-E raid-disks=5 | PCi_E raid-disks=4 |
| hdparm -t /dev/md0 | 305.9 MB/S | 294 MB/s |
| wr-nexgen RAID disk | 2355 Mb/s | 2453 Mb/s |
| wr-nexgen to multiple disks | 2379 Mb/s | 2505 Mb/s |
As seen the performance of 4 disks is higher than adding an extra disk to the PMP. Probably the chip cannot handle correctly more than 4 disks.
21 July 2008 - SATA controller tests
./wr-nexgen 750000000 32768 1000000 32768 1 /dev/md0 1000000
HP SC44Ge / LSISAS1068E controller tests - all 4 int + 4 ext SATA disks are detected, provided that the correct SFF multi lane to SATA converter is used. Turns out port multipliers did not work, it still sees only one disk per PMP. Log files. Performance with 10 disks configured together with 6 disks behind HP and 4 behind nForce680i native: 2780 Mbps average. With 8 disks behind HP in hardware RAID, performance around is 1.8 Gbps.
24 July 2008 - SATA controller tests
Abidal nForce 680a was used for the following tests. Connected two Addonics PMs via the Addonics ADSA3GPX8-4E 4xeSATA card in a PCIe x8 slot. Doing some cross-set testing over different ports, using wr-nexgen with 32kB block 128MB RAM 20312.50MB file to raw /dev/md0 and finally to XFS filesys. For wr-nexgen tests see log file. Results:
| | | ||||||||
|---|---|---|---|---|---|---|---|---|
| sdb1 | sdc1 | sdd1 | sde1 | | | sdf1 | sdg1 | sdh1 | sdhi1 |
| Took 248.825253 seconds, 684.792230 Mbits(dec)/s : /dev/sdX : log | ||||||||
| sdb1 | sdc1 | sdd1 | sde1 | | | sdf1 | sdg1 | sdh1 | sdhi1 |
| Took 242.481281 seconds, 702.708264 Mbits(dec)/s : /dev/sdX : log | ||||||||
| sdb1 | sdc1 | sdd1 | sde1 | | | sdf1 | sdg1 | sdh1 | sdhi1 |
| Took 122.601060 seconds, 1389.821590 Mbits(dec)/s : /dev/md0 : log | ||||||||
| sdb1 | sdc1 | sdd1 | sde1 | | | sdf1 | sdg1 | sdh1 | sdhi1 |
| Took 276.815855 seconds, 615.548557 Mbits(dec)/s : /dev/sdX : log | ||||||||
| sdb1 | sdc1 | sdd1 | sde1 | | | sdf1 | sdg1 | sdh1 | sdhi1 |
| Took 119.279701 seconds, 1428.52154 Mbits(dec)/s : /dev/md0 : log | ||||||||
| sdb1 | sdc1 | sdd1 | sde1 | | | sdf1 | sdg1 | sdh1 | sdhi1 |
| Took 82.755253 seconds, 2059.006458 Mbits(dec)/s : /dev/md0 : log | ||||||||
| sdb1 | sdc1 | sdd1 | sde1 | | | sdf1 | sdg1 | sdh1 | sdhi1 |
| Took 82.314915 seconds, 2070.020963 Mbits(dec)/s : /dev/md0 : log | ||||||||
| sdb1 | sdc1 | sdd1 | sde1 | | | sdf1 | sdg1 | sdh1 | sdhi1 |
| Took 85.464052 seconds, 1993.745863 Mbits(dec)/s : /raid XFS : log | ||||||||
It is curious to see ADSA3GPX8-4E performance level off at around 2 Gbps. Perhaps the PCI-X to PCIe bridge on the card causes the low performance. Or maybe the eSATA ports can perform only at 1 Gbps each. This can be checked later by connecting 4 instead of 2 PMs to the eSATA ports and distributing the same 8 disks over the 4 PMs. Update: we re-ran the last test and managed to get around 4 gbps, when writing to a raw /dev/md0 that was not xfs-formatted and not mounted
Some thoughts for 4 Gbps: the 20 GByte file at an 4 Gbps externally pushed rate means 40 seconds at 512 MB/sec. Abidal has 3800 MB free RAM, about 4 seconds if the OS takes other half for buffering. So wr-nexgen output should show current interval-average rates in <2 second intervals. If any of the <2s interval-average reported rate drops below 4 Gbps then 4G network to disk recording will not work without loss.
25 July 2008 - SATA controller tests
Most of the previews tests have been using 2, 4 or 8 disks. Follows tests have been trying to push more than 4 Gbps to a larger devices. As we don't have yet more than 2 PMP compatible with the PCI-E Host controller the disks has been connected to the mobo native ports, PCI-E host controller and/or HP subsequently. The follows table shows number of disks, kind of connection used and rates achieved:
| hdparm | wr-nexgen | |
| 8*native | 425 MB/s | |
| 8*native + 4*PCI-e | 413 MB/s | 4535 Mbits/s |
| 4*native + 4*PCI-e | 428 MB/s | 3800 Mbits/s |
| 8*PCI-e | 455 MB/s | 3815 Mbits/s |
| 4*native + 8*PCI-e | 440 MB/s | 4719 Mbits/s |
| 8*native + 8*PCI-e | 481 MB/s | 4600 Mbits/s |
An attempt to use 16 disks: 8 connected to PCI-E card and 8 to the native port didn't clear solve the problem. The speed is still a bit over 4 Gbps, but far from teoritecally 8 Gbps. Total capacity of the raid is 10.2 TB.
05 August 2008 - RocketRaid 2522
Got a HighPoint RocketRaid 2522 (2 x miniSAS, ads claim PMP support up to 40 disks). First impressions: the Linux driver seems to come in mixed binary and source format, modinfo rr2522 shows license as Proprietary. The Linux side sees no individual disks till now. But it sees the RAID array(s) configured in the RocketRaid BIOS. PMP support works as advertised, the RocketRaid bios detects all disks behind a PMP.
For the first test, we used one mini-SAS to SFF cable connected to an AD4SAML (SFF-to-SATA). Each of these four SATA ports was hooked to one AD5SAPM 5-port PMP. Eleven (11) SpinPoint F1 750BG disks were available. The theoretical peak rate would be 100MB/s*8bit*11 = 8800 Mbit/s. For assumed best per-link bandwidth utilization, the disks were scattered over the four PMPs (theoretical rate 4 x 300MB/s*8bit = 9600 Mbit/s) in a 3 + 3 + 3 + 2 configuration. The disks were combined in HighPoint BIOS into one single RAID-0. The chunk size was not configurable nor displayed.
The quick test
With a wr-nexgen writing onto the raw blank /dev/sdb device, the rate was initially between 4078 Mbit/s and 4966 Mbit/s, but then had long dips down to 3500 Mbit/s. CPU core loads were quite steady, 90%:43%:25%:0%sy (158%sy).
The second test: 12 disks, 3 behind each of the 4 PMPs. A hardware RAID-0 was created for each 3-disk PMP. These were then combined into a mdadm md0 with chunk size 512k. The wr-nexgen write rate to raw /dev/md0 improved marginally. Peak 4900 Mbit/s, low 4260 Mbit/s. CPU core loads were quite steady, 85%:50%:25%:0%sy (160%sy), and the 25% CPU has additional 15%hi 20%si loads. With chunk size 64k the performance is less, already at the beginning only 3700 Mbit/s with 50%:35%:15%:0%sy (100%sy) CPU. Using a 2048k chunk size and performing 256kB writes, the rate is 5090 Mbit/s peak 4540 Mbit/s low, with steady CPU 90%:40%:25%:0% (155%sy), plus the 25% CPU has 15%hi 19%si. Writing to XFS on the same raid gives 4450 Mbit/s peak 3960 Mbit/s low, highly varying CPU 75%:40%:25%:25%sy (165%sy) and one CPU additionally 8%hi 2%si.
The third test: same 12 disks in 3 disks/PMP grouping, but only one pair of PMPs is taken into software RAID i.e. just 6 disks. Chunk size 1024k. Curiously this 2x3-disk write rate is 3400 Mbit/s peak 3130 Mbit/s low. Yet earlier with the 4x3-disk and 1x12-disk (1x11...) setups the rate is not even close to double of this.
One possibility for the low rate is that, since the RocketRaid 2522 has two mini-SAS connectors each with its own PCI-X controller and we are using only one of the miniSAS, the on-board PCI-X bus or CPU may be saturated at around 4-5 Gbps.
Still, around 4300 Mbit/s to XFS is nearly possible, using 4 3-disk hardware RAIDs combined into 1 software RAID with 1024/2048k chunk size and with large (256kB) writes.
| RocketRaid config | wr-nexgen Mbit/s peak | wr-nexgen Mbit/s low | Average |
|---|---|---|---|
| theoretical 12 disk perf | 100MB*12 = 9600 Mbit/s | 4 SATA * 300MB/s = 9600 Mbit/s | |
| 11 disks, 4 PMP 3+3+3+2, 1 HW RAID, /dev/sdb | 4093 Mbit/s XFS 4966 Mbit/s raw | 3893 Mbit/s XFS 3500/4078 Mbit/s raw | |
| 12 disks, 4 PMP 3+3+3+3, 4 HW RAIDs, /dev/md0 512k | 4900 Mbit/s raw | 4260 Mbit/s raw | |
| 12 disks, 4 PMP 3+3+3+3, 4 HW RAIDs, /dev/md0 64k | 3700 Mbit/s raw | ... | |
| 12 disks, 4 PMP 3+3+3+3, 4 HW RAIDs, /dev/md0 2048k | 5090 Mbit/s raw if 256k writes, 4450 Mbit/s XFS in 256k writes | 4540 Mbit/s raw 256k, 3960 Mbit/s XFS in 256k writes | |
| 6 disks, 4 PMP 3+3+*+*, 2 HW RAIDs, /dev/md0 1024k | 3400 Mbit/s raw | 3130 Mbit/s raw | |
| 5 disks, 1 PMP, 1 HW RAID, /dev/sdb | 1856 Mbit/s raw | 1736 Mbit/s raw | 1794Mbps |
| 4 disks, 1 PMP, 1 HW RAID, /dev/sdc | 1942 Mbit/s raw | 1644 Mbit/s raw | 1771Mbps |
Log files are in 05082008.
06 August 2008
ADSA3GPX8-4E 4xeSATA
Plugged in the ADSA3GPX8-4E 4xeSATA PCIe into Abidal nForce680i. The SATA PMPs could be connected with eSATA-to-SATA converter cables. Replaced the old wr-nexgen on Abidal with the newer wr-nexgen.c that reports the rate in 0.5s intervals and has xfs real time support. Updated the disktune.sh script: can disable NCQ, disable fancy drive I/O schedulers (CFQ->NOOP), tune sector/write size, and other interesting things. In the end however all the tunings essentially gave zero benefit in write performance.
| Configuration | Mbit/s(dec) | Mbit/s(dec) | Mbit/s(dec) | Write config | Ctrl-C'ed after | |
|---|---|---|---|---|---|---|
| 2 PMP : 5+5 disks | 3734 avg | 3843 max | 3733 min | 32k writes | 93s | |
| 3 PMP : 5+5+2 | 4374 avg | 5110 max | 3962 min | 32k writes | 566s | |
| 3 PMP : 5+5+2 | 4521 avg | 5490 max | 4313 min | 128k writes | 142s | |
| 3 PMP : 5+5+2 | 4553 avg | 5335 max | 4402 min | 128k writes, NOOP scheduler | 107s | |
| 3 PMP : 4+4+4 | 4717 avg | 5285 max | 3939 min | 128k writes | 1778s | log plot |
| 3 PMP : 4+4+4 | 3985 avg | 4403 max | 2615 min | 128k writes, XFS | 507s | log plot |
| 4 PMP : 3+3+3+3 | 4870 avg | 5334 max | 3871 min | 128k writes | 844s | log plot |
Entire screenlog.
Revisiting the RocketRaid 2522
The amug.org site has a review of the 2522 controller. They state average writing at 699 MB/s (5600 Mbit/s) with a 4-PM 16-disk setup. Their 2-PM 10-disk setup does 427 MB/s (3420 Mbit/s). Yesterdays 4-PM 11-disk RAID0 was 3500..4966 Mbit/s (438..620 MB/s). Yesterdays 4-PM 11-disk 4-HW-RAID0 with software /dev/md0 was 4500..5000 Mbit/s (562..625 MB/s).
A RocketRaid 2522 setup with 2 PM's at each of the 2 miniSAS connectors and 3+3+3+3 disks configured into 4 HW-RAID0's: very fluctuating 3470..5180 Mbit/s rates when using 1 wr-nexgen that writes blocks across the 4 RAIDs. With 4 wr-nexgen writing independently to the 4 RAIDs, each of them runs at a quite stable 1540 Mbit/s (6160 Mbit/s aggregate).
A RocketRaid 2522 setup with 4 PMs behind a single miniSAS connector and 3+3+3+3 disks configured into 4 HW-RAID0's: not any noticeable difference
compared to the dual-miniSAS case. The four writers run at 1530 Mbit/s each. See the corresponding 06aug2008 log and combined plot. About CPU load:
three cores report 50%sy 30%wa 0%hi 0%si. The fourth reports 25%sy 2%wa 25%hi 45%si. Eight 8 'pdflush' instances are running.
The single-miniSAS and dual-miniSAS rates are essentially identical. Clearly the processor or PCI-X buses on the RocketRaid are not the bottleneck. Something else limits the single-writer performance to about 2 Gbps below the expected 4 * 1530Mbps = 6.1 Gbps. The nForce680a on-board 12 SATA's achieved around 5.5 Gbps earlier.
Assembling the 4 hardware RAID0s behind the single miniSAS into /dev/md0: the initial write rate is high but quickly drops to 3550 Mbit/s. Interestingly there is only 1 'pdflush' instance. The rate peaks occasionally for several 10s at 5300 Mbit/s and in this case 'pdflush' consumes 100% CPU (6%us 54%sy 14%wa 26%si). When 'pdflush' drops idle, the rate drops to 3550 Mbit/s.2
There is a good paper Extreme Linux Performance Monitoring Part II. There are statistics for the 4 x hardware RAID0 in 1 software RAID configuration (the run at the end of 06aug2008 log). Combined statistics of vmstat and other output for the single wr-nexgen test.
12 August 2008
Some tests on Abidal AMD2212 with CPU frequency, Hyper-Transport frequency and CPU core# ('nosmp' or disabling individual cores via /sys/devices/). To see if it makes sense to use POSIX AIO. 2 PMPs 5+5 disks, 3 PMPs 5+5+2 disks. Rate shown is the average rate after 140 seconds.
The screenlog is summarized in the 12aug2008_summary.txt.
| 1GHz | normal | HT | DDR2 | nosmp | 2MB | |
|---|---|---|---|---|---|---|
| bgchdiejfk | ||||||
| xfs | 2652 | 3761 | 3780 | n/a | 3566 | n/a |
| raw | 3156 | 3809 | 3794 | 3802 | 3557 | 3802 |
| bcdefghijk | ||||||
| xfs | 2618 | 3776 | ||||
| raw | 2980 | 3807 | ||||
| bgchldiejfkm | ||||||
| xfs | n/a | |||||
| raw | 4530 |
1GHz: CPU clock reduced to half; HT: 1 GHz CPU-CPU 600, Mhz CPU-SB1/SB2; DDR2: 667 MHz upped to 800 MHz; nosmp: no multiprocessor, just 1 core for OS+apps; 2MB: 2MByte write size
9 September 2008
Useful commands:
$ watch -n2 iostat -d -m /dev/md0 /dev/sd{a,b,c,d} 1 2
$ vmstat 5
11 September 2008
Short summary of the three motherboards tested last days, the models were: Asus Striker II Extreme, Asus Rampage Extreme and P5Q PRO.
First mobo to fall to our hands was the P5Q Pro and the main purpose of this board was to test the functionality of its new ICH10R SB chip. Even the specification does not claim clear support for PMP, we still gave a chance for Intel ;). As the flag R shows, the board has PMP support, BUT is non- FIS-based, so it does not help at all. The performance of a single disk (114 MB/s) is superior than 4 (104 MB/s).
Ups, I forgot to mention, we got new disks. Samsung F1 1 TB, and the benchmarks said, the performance is higher than the previous 750GB model. Returning to the P35Q board, 6 disks gives a total output rate of 425 MB/s. Tweaking a bit we can get 540 MB/s but not continuously...(Disabling #NCQD-> see file disktune.sh). Other tests as 10Gbps board + Silicon PCI-E gave very good results. So it's cheap version which give good results for a 4Gbps recorder system, but with limitations.
Next to test was the Striker II. We had few memory problems with DDR3 modules in both mobo's, after struggling for a while we realized that the memtest version was too old for the new RAM modules. So when we upgraded memtest to the latest version the problems disappear.Single disks 112 MB/s (900 MBps), 6 disks 451 MB/s (4800 MBps/4675 MBps)
10 disks - 6+4 - 618 MB/sec 6991/6550 Mbps
9 disks - 3+3+3 - 630 MB/sec 5160 Mbps
9 disks - 3+2+2+2 - 618 MB/sec 5150 Mbps
10 disks - 3+3+2+2 - 666 MB/sec 5270 Mbps
12 disks - 3+3+3+3 - 681 MB/sec 5519 Mbps
Chart summary
| L1N 64 | P5Q Pro | Striker II | Rampage | |
| single disk | 114 MB/s | 114 MB/s | 114 MB/s | |
| six disks | 425 MB/s | 451 MB/s |
Sustained recording rate in RAID disks for different disk controllers.
07 July 2009
Did some tests with newish kernel versions like Ubuntu Karmic 2.6.30. In addition as these new kernels now include support for the ext4 file system, I benchmarked that one as well. The kernels were from http://kernel.ubuntu.com/~kernel-ppa/mainline/ and installed on the Abidal 2xAMD2212 computer still running Ubuntu Intrepid.
The tests used 20 x 1TB disk RAID0 (4G-EXPReS) connected to the Addonics 4*eSATA controller. Because current ext4 tools are limited to a ≤16TB file system size, the 20TB were GPT-partitioned to contain only a 14TB partition. The power draw of the diskpack has increased over time, due to whatever reason, and now the +5V supply to the PMP boards inside the diskpack is only around 4.8V when idle and 4.65V under load. It seems there were some Block I/O outages and writing 'pauses' in the logs due to this.
The ext4 was formatted with supposedly optimal settings for 20-disk 1024kB-chunksize systems: -b 4096 -E stride=256,stripe-width=$((20*256)) -i 131072
The resulting files can be found under matlab_logplot.
| This work has received financial support under the EU FP6 Integrated Infrastructure Initiative contract number #026642, EXPReS. | ||


