4G-EXPReS Linux RAID Data Acquisition
Content
1 - Introduction
2 - Use Cases
3 - Recommended Hardware
4 - Building A RAID System
5 - Software Utilities
6 - iBob Example Firmware (VDIF)
1 - Introduction
A normal Linux system can be equipped to record 10 GbE network data to RAID at over 4 Gbps. In our 4G-EXPReS system, Port Multiplier -capable Serial ATA controllers are used to expanding the high-speed recording capacity up to will. The recorder can be used to capture network traffic such as real-time VLBI data streams onto disk. Tsunami and other transfer software allow fast file transfer over 10 GbE links.
The multilane/external RAID variant of 4G-EXPReS requires only a suitable SATA PCI Express controller card. Thus external 4+ Gbps RAID support can be added to almost any VLBI data aquisition computer, including Mark5A, Mark5B and PC-EVN.
At Metsahovi we have used the 4G-EXPReS successfully to record low-speed water maser and spacecraft observations, have performed up to 7 Gbps Tsunami-UDP file transfers and have captured 2x2048 Gbps VLBI observations from iBOB UDP VDIF streams.
When the European dBBC digital baseband converter comes available with FiLa10G, we will move to 10G network to disk recording in entirety using the hardware of our 4G-EXPReS concept.
Similar inexpensive storage has been used at Backblaze, see Petabytes on a budget: How to build cheap cloud storage.
2 - Use Cases
The 4G-EXPReS can be built with two uses in mind: local data storage and externally-connected shippable disk packs.
If the disks do not need to be shipped, tens of SATA disks can be crammed into the same computer rack enclosure. What you are going to build is a fancy custom 10GbE NAS storage system. As you choose the consumer components yourself, the price and performance are significantly better than the enterprise 10G SAS storage systems.
If the disks are going to be shipped, there is a 10 and 20-disk luggable disk pack design available. It can be connected to the computer using a standard multilane SAS/SATA cable. It is a portable RAID system, containing only disks and minimum electronics, no power supply and thus weighs essentially only as much as the SATA disks themselves. (TODO: links, photos, building cost).
3 - Recommended Hardware
The most important hardware components for a 4G-capable RAID recorder that can be attached to a multilane diskpack are below (06/2009)
- System: an Intel Nehalem / Core i7 generation system with 4 GB RAM and your own preferred Linux distro, FreeBSD and OpenSolaris known not to work (~$1000)
- Power: a normal ATX PSU for the motherboard, a second single-rail(!) 850W ATX PSU such as a Corsair CMPSU-850TX for the diskpack
- RAID controller: multilane SATA controller such as the Addonics ADSA3GPX8-ML (1 x $185)
- SATA disk cages: for a normal RAID instead of diskpack you can get use 5-disk SATA cages like the SNT-3051SATA, CSPC-SK335B or Icy Box trayless cages. If you plan to use external diskpacks getting cages for fun is up to you.
- 10G NIC: any popular 10G network card such as Chelsio N310E-BT or N320E-CXA ($720)
- SAS multilane cable: a 1.5m cable should be sufficient ($60)
- Power cable: up to 2m of 8-strand heavy cable, one Neutrik Speakon NL8FC 8-pole latch lock power plug, Farnell #724567 ($15)
The diskpack consists of:
- Mechanics: two end plates, eight mounting rails, one middle plate for port multipliers
- SATA disks: a bunch of 10..20 non-green 7200rpm SATA disks with high capacity
- Multilane to SATA bridge board: one external bridge board not intended for a PCI slot, Addonics AD4SAML (1 x $29)
- Port Multiplier boards: four such boards per diskpack, the Addonics AD5SAPM 5x1 Internal is suitable (4 x $75)
- Power connector: Neutrik Speakon NL8MPR 8-pole chassis connector, Farnell #1108200 or #724579, Mouser 538-67926-0011 and xx ($10)
- Multilane connector: one SFF-8470 female chassis connector common in External SAS
- SATA power connectors: 15-pin SATA IDT Feed Through connector, Molex 67926-0001 and 67926-0041 cover, Digikey WM19024-ND and WM19025-ND (20 x $3)
- SATA cables: lots of short ones (20 x $5)
- Carrying handle: up to preference, for example digikey#?
The disks must be air-cooled. The diskpack should be placed onto any rack fan tray that is a) extends sufficiently deep into the rack and b) mechanically rigid enough to carry 20kg. There are not very many sliding fan trays that would meet the 20kg carrying requirements. We decided to make two fan tray options. The first method takes an empty pizza box 1U server enclosure, holes for 12cm fans are drilled into the top cover. The other method takes a solid, sliding rack shelf and we added 12cm mains/230VAC fans and a mains power switch to it.
4 - Building A RAID System
For a fixed RAID computer: first connect as many of the SATA disks to the motherboard SATA as possible. Additional disks to reach 12 or 20 disks should be connected behind a SATA controller. All disks should be powered from the ATX PSU. The PSU should have a lot of Molex/SATA power connectors to feed the usually double power connectors on your disk cages. The PSU can be normal or single-rail.
For a multilane RAID computer: install the multilane card. Install the single-rail ATX power supply into the computer enclosure, do not connect it to the motherboard however. Instead split enough 12V-5V Molex and 12V PCIe/AUX cables to feed the diskpack power supply cable.
Suggested mechanical design for a 20-disk portable multilane cage: diskpack_v1.0.zip (x MB, SolidWorks). The only requirement for compatibility is that the cage has: 1 x SFF-8470 chassis connector, 1 x Speakon NL8MPR chassis power socket. The socket should use following pin assignment: pins 1-,2-,3-,4- are GND, pins 1+,3+ are +5V, pins 2+,4+ are +12V.
5 - Software Utilities
The following tools can be of help when building a RAID-0 with the internal or diskpack SATA drives.
- condition_disks.sh: usage ./condition_disks.sh /dev/sd{d,e,f,g,h,i,j} or similar. It does some safety checks, erases the specified disks and builds a two-partition RAID0. The first small ext3 partition is for user convenience such as storing owner or metainfos that should be harder to delete accidentally. The second partition is XFS and for storing the actual data.
- ibob_utils_v11.tar.gz: VDIF library and VDIF UDP packet capture utility, can be used to record streamed VDIF onto the RAID at over 4 Gbps
- netraid_utilities_v10.tar.gz: udp2raid, raw2raid, raid2raw, raid2udp utilities for capturing UDP or raw Ethernet frames from network to disk, or to send data over the network
- mark5cEmu-v1.0.0.tar.gz: Python-based Mk5C server emulator, allows FieldSystem to connect and control the RAID computer and start network data capture as if it was a Mark5.
- wr-nexgen.c: large sequential I/O benchmark tool for RAID systems that includes multicore and real-time optimizations
6 - iBob Example Firmware (VDIF)
An iBob firmware that streams 2-bit 2-channel data in two VDIF Streams is provided below. The data is sent as UDP jumbo packets that encapsulate a small VDIF frame without any extra UDP headers. The UDP has just plain VDIF as payload. The 2-bit values are encoded in VLBA-style sign/mag (+-1.0, +-3.3359) without a representation for value 0.
- expres4g1024M_v12.rar (37 MB): the full precompiled design including Simulink model, VHDL and all source code.
- expres4g1024M_v12.txt: English description and changelog.
- expres4g1024M_2009_Jun_05_1841.bit: only the bit file, can be flashed onto iBob's FPGA by following simple instructions in the Berkeley Casper IBOB PROM Burning.
- vsi4g_v10.rar (30 MB): reads VSI4 geo/astro data from VSI connector(s) and outputs 10GbE VDIF Stream
7 - Validation
Disk recording has been validated with XFS and HFS+ file systems on a 20-disk diskpack using OS X 10.5 and Ubuntu Jaunty. Rates exceed 4 Gbps. On OS X rates are close to 5.5 Gbps. A 'capturefs' file system for even closer to raw blockdevice rates in under work 08/2009.
The iBOB system has been tested in Venus Express phase referenced spacecraft tracking and planetary water vapour detection throughout 2009 (JIVE, Metsähovi). Papers published in 2009: (list todo)
A 12 Gbps demo observation with Jb, On, Mh is planned for 10/2009 and currently under preparation (status of 09/2009).
| This work has received financial support under the EU FP6 Integrated Infrastructure Initiative contract number #026642, EXPReS. | ||