How To Build A Low Cost SAN

Krishna Kumar
April 9, 2009

1 OBJECTIVE

In today's world there is a obvious need of information sharing in every department and network storage can help us to achieve this most growing challenge. Here in this article we are focusing our concentration to make a San which has following features:

  • Low cost and easily affordable
  • Ensured Scalability
  • High Reliability
  • Easily Manageable
  • High Performance
  • Ensured Security
  • High availability

 

2 Available Options For SAN

There are some options available to make a reliable San which is quite complex and expensive. These are iSCSI, NBD, ENBD and FIBER CHANNEL. iSCSI, NBD, ENBD works on TCP/IP layer which has much overhead. Luckily we have a protocol which can easily serve our purpose in a affordable cost and less overhead. All we typically need is some dual-port Gig-E cards, a GiG-Ethernet switch and some disks. This is a very simple and lightweight protocol and it is known as ATA OVER ETHERNET. AoE comes with linux kernel as a kernel module. AoE does not rely on network layers above Ethernet, such as the IP and TCP that iSCSI requires. While this makes AoE potentially faster than iSCSI, with less load on the host to handle the additional protocols, it also means that AoE is not routable outside a LAN. AoE is intended for SANs only. In this regard it is more comparable to Fiber Channel over Ethernet than iSCSI. It export block devices (SATA HARD DISKS) over the network with a very high throughput when coupled with a quality Ethernet switch. A qaulity Ethernet switch can maximize throughput and minimize collisions through integrity checking and packet ordering. While using AoE in a large scalable enterprise environment, we can take the help of RED HAT cluster aware tools like CLVM, GFS, DLM, DRBD, HEARTBEAT etc-etc.

 

3 Cost Comparison among AoE, FC & iSCSI

Cost Comparison

Technology Speed Server Interface Switch Cabling Storage/TB
AoE 2Gb $99 $15-$30 $25-$35 $400-$500
iSCSI 1Gb $500-$1000 $400-$600 $25-$35 $1000-$5000
Fiber Channel 4Gb $1200-$2000 $800-$3600 $175-$225 $4000-$10000

 

4 Comparison of AoE vs iSCSI

4.1 These are the following advantages of AoE over iSCSI

  • AoE is cheap and simpler software stack. The advantage of AoE is that you don't have the overhead of translating ATA to SCSI then back to ATA if you are using ATA drives. So there is a performance pickup.
  • Server processing load for iSCSI is much higher than AoE for equivalent throughput. AoE can spare processing cycles. iSCSI requires TCP/IP and its requisite complexity.
  • AoE is not a routable protocol. Therefore it provides you inherent security.
  • AoE Ethernet frames are passed by standard switches.
  • AoE and iSCSI both have initiator support for Windows and Linux.

 

4.2 Disadvantages of AoE over iSCSI

  • If you need features such as encryption, routability and user-based access in the storage protocol, iSCSI is a better choice.
  • AoE is not much suitable for critical enterprise applications. AoE is not as scalable as iSCSI or Fiber Channel when you consider location i.e., with Fiber Channel and iSCSI, you can scale your storage throughout. This is primarily due to the inability of AoE to route AoE traffic.
  • ATA disks are not as reliable as their SCSI counterparts.

 

5 Available AoE Targets

There are following AoE Targets (server) available on GPL:

  • Kvblade
  • Aoeserver
  • Vblade-Kernel
  • Vblade
  • Ggaoed
  • Qaoed

 

6 Feature Comparison of available AoE Targets

You can export your block-devices over the network by any of the available tar- gets. But the thing is how we can export our block devices in a much configured and manageable way so that it can help us to achieve our targets. These are the following features available by which you can configure your block devices, either by command-line or by configuration file. These are the acronym followed in this table for AoE targets:

  • KV - Kvblade
  • AOES - Aoeserver
  • VB-KER - Vblade-kernel
  • VB - Vblade
  • GGOLD - Ggaoed base version
  • GGNW - Ggaoed updated version
  • QD - Qaoed base version
  • SD - Sqaoed (ported version of qaoed on Solaris 10)

Small description of all the features are given in terms and terminology section.

Features

FEATURE KV AOES VB-KER VB GGOLD GGNW QD SD
Shelf Y Y Y Y Y Y Y Y
Slot Y Y Y Y Y Y Y Y
Interface Y Y Y Y Y Y Y Y
Device-path Y Y Y Y Y Y Y Y
Conf-file N N N N Y Y Y N
MTU N N N N Y Y Y N
Mac-filtering N Y N Y Y Y Y N
ACL-listing N N N N Y Y Y N
Buffer Count N N Y Y Y Y N N
Sectors N N Y N N N N N
Queing N N N N Y Y N N
Logging-info N N N N Y Y Y N
Direct-Mode N N N Y Y Y Y N
Sync-Mode N N N Y N N N N
Read-only Mode N N N Y Y Y Y N
UUID N N N N Y Y Y N
Write Cache N N N N N N Y N
Policy N N N N Y Y Y N
Trace-i/o N N N N Y Y N N
Jumbo-Frames Y Y Y Y Y Y Y Y
On GPL Y Y Y Y Y Y Y Y
Reliability Low Med Low High High Med High Med
Usability Low Med Low High High Med High Med
Share this page:

11 Comment(s)

Add comment

Comments

From: at: 2009-12-23 16:10:09

'ATA disks are not as reliable as their SCSI counterparts.'

 This is wrong in this context. The reliability of hardware is not relevant for the protocol.

From: at: 2009-12-25 19:16:31

agree.  In fact you can make a point that you can create a MORE reliable array for the same money out of SATA disks.

 

The reasoning here is that if you can buy a SATA disk for half the cost, you can simple buy your rundundant drives and have them on hand.  The failure rate of SATA drives vs SAS drives is not double.  Therefore you can get a more reliable array if you concede that you will replace drives at least slightly more often.

Additionally, you can have more levels of redundancy with SATA drives for the same money.  Thinking of raid5 with a hot spare?  how about raid6 with 2 hot spares?  how about raid10 plus 2 hot spares?  SATA drives are cheaper and larger so a raid10 can be had for less money than a SAS raid5.

 drawbacks?  RPMS.  raptors defeat the advantage of SATA because of price.  SAS drives are up to 15,000 RPMS.  Luckily**, you are likely going to have a network bottleneck once you have 6 active disks (like a 12 disk raid10, you get the performance of 6 drives).

If you use a filesystem like XFS or ZFS you can use a very fast SSD to improve your i/o as XFS can push the transaction logs to a different device and ZFS can use a fast disk as an inline cache.  This way you can get the benefits of SATA in drive size and price, and one fast/expensive SSD to help bridge the gap in access times (~8ms 7200rpm SATA, ~4ms 15krpm SAS)

From: Anonymous at: 2011-01-06 03:19:58

" The failure rate of SATA drives vs SAS drives is not double. "

Desktop SATA - 10^14 URE rate

Enterprise SAS - 10^15 or better URE rate

What then?

From: krishna at: 2009-12-29 10:36:50

I am confused by the network raid layout and why you would do it in such a way.  The use of Raid0 for performance is completely negated by having a mirrored and a stripped volume on the same system due to network bottlenecks, and this is assuming that we replace the loop files with physical disks.  Would it not be better to simply raid0 the disks on each fileserver and the do a raid1 mirror on the client?

>>> There is nothing to confuse. I have just given the outlines. You can use whatever RAID and LVM combination you want. Its upto you what you want to acheive.

 With some forethought, this can easily provide a mechanism to replicate the entire SAN to a second unit.  Simply export the original raid device via DRDB and have the second machine a member of the DRDB group.  Again, bond some NIC cards and possible direct connect the two boxes if they live together to keep that traffic off the network.

>>> ya ofcourse, DRBD is network over RAID1. and in my next article you will get how I have used it in my HA SAN.

 

Best Regards,

Krishna

From: at: 2009-12-23 21:06:50

opensolaris b129 ZFS with deduplication, CIFS, and iSCSI.

put a box together with your desired number of disks.

decide on raid5 or raid6, raidz~=raid5 and raidz2~=raid6

$zpool create san raidz2 disk1 disk2 disk3 etc etc etc #zfs create -V (size)g san/volume1

$zfs set shareiscsi=on san/volume1

$zfs set dedup=on san/volume1

$iscsitadm list target Target: san/volume1 iSCSI Name: iqn.##:blahblah

If you really want to use AoE instead of iSCSI then install the CORDaoe pkg and read the man pages.

Im not trying to make this a deduplication thing but if you are investing in a homebrew SAN device, just get 8GB of ram and a quad core and do dedup simply for the performance gain in writes. The dedupe mechanism is in-line so every block written will be deduped if possible and a dedupe operation on a 20MB file could make it seem like its just a 20KB file.

important things to get for a SAN:

get a SAS card with cache and a battery backup for the cache. redundant power supplies for the server. UPS device dedicated to the SAN.

I prefer to do the raiding in software because I dont have a worry about losing a controller. If the controller or even the whole server crashes I can migrate the drives to something else and import them with linux software raid or zfs.

I also like ZFS as it has matured and is pretty reliable now and offers deduplication, compression, ability to make multiple copies of files within a zfs volume in addition to rai, as well as ability to put 1 higher performance drive in the system (like an SSD) for caching writes which can improve performance for small files significantly.  It can also do online, nearly instant snapshots, which are read only mountable and exportable for backups.

 

From: Krishna Kumar at: 2009-12-28 11:30:00

Hi ,

Using ZFS is a good suggestion, I have also thought about it. But I don't think we can use it directly due to some open source licensing issues. Fuse can be one alternative against open source for using it.

Second

When we are talking about SAN, we need a cluster aware reliable file system, here GFS and butrfs seems a very good alternative and it comes with Linux kernel. So we can use it.

 

Third

I have written this article for AoE, If you want to go for iSCSI, you can go for it.

 

Best Regards,

Krishna

 

From: at: 2009-12-23 20:30:45

I am confused by the network raid layout and why you would do it in such a way.  The use of Raid0 for performance is completely negated by having a mirrored and a stripped volume on the same system due to network bottlenecks, and this is assuming that we replace the loop files with physical disks.  Would it not be better to simply raid0 the disks on each fileserver and the do a raid1 mirror on the client?

 Server1:

/dev/md0 contains a raid0 of disk1 and disk2

exports shelf0 slot0 aka e0.0

Server2:

/dev/md0 contains a raid0 of disk3 and disk4

exports shelf1 slot0 aka e1.0

 also consider that the concept of 'shelf' is to describe a site or home of the disk so simply for organization sake all disks in a single server should be on the same shelf.

 

Client1:

/dev/md0 is a raid1 of /dev/etherd/e0.0 and /dev/etherd/e1.0

 

This would be faster simply because each server would not have to have both a stripe and a mirror.

 

Though I make this suggestion, I would not recommend a client-side raid setup on a SAN in the first place.  It would be much better to have a single SAN box with a redundant copy via DRDB

Server1:

many disks in a RAID array.  Partition those disks with LVM and export the LVs via vblade.  Then you can have more disk in a raid5,6, or 10.  you can do switch-level NIC bonding to get multiple Gigabit channels out to the switch (and do the same on the client if necessary)

 

With some forethought, this can easily provide a mechanism to replicate the entire SAN to a second unit.  Simply export the original raid device via DRDB and have the second machine a member of the DRDB group.  Again, bond some NIC cards and possible direct connect the two boxes if they live together to keep that traffic off the network.

 in my next post I will outline what I have done:

From: balaji at: 2010-02-01 10:30:17

thank you for the article. Nice one.

 As of now, i am trying to learn SAN technology.

Could you please give some URLs/links which explains san concepts and configuration deeply.

I would be more happy if you provide some information about SAN.

Thanks.

 

From: at: 2010-02-24 18:15:38
From: rasker at: 2010-02-06 12:13:01

Hi, fascinating article. Thanks very much for writing it.

I was wondering what, in your opinion, is the most interesting target? This is with respect to performance and simplicity of configuration. Did you reach any conclusions in your testing?

From: at: 2010-02-24 18:17:36

I will  for vblade, ggaoed or qaoed depending on my need.