How To Build A Low Cost SAN - Page 3

8 AoE Initiator Setup

On client side you must have aoe driver and aoe-tools installed on your system (refer to section AoE Tools and Commands). You will find the corresponding exported block devices in your /dev/etherd directory. You just have to run the following command:

[[email protected] trunk]# modprobe aoe
[[email protected] trunk]# lsmod j grep aoe
[[email protected] trunk]# aoe-stat
[[email protected] trunk]# ls -l /dev/etherd/
[[email protected] trunk]# mkfs.ext3 /dev/etherd/e0.0
[[email protected] trunk]# mount /dev/etherd/e0.0 /mnt

So, finally you have access on your exported block device. Now you are free to do any type of read, write operation on this.

 

9 AoE Target Performance Graph

9.1 Hardware Used for Performance Measurement

In this experiment I have used following hardware to measure the disk i/o:

  • 3 com super stack 24 port gigabyte switch
  • gigabyte five port nic card
  • 160 GB Sata hard disk ? Intel(R) Xeon(R) CPU E5450 @ 3.00 GHZ
  • 24 GB RAM

My network supports following packet size:

[[email protected] ]# ifconfig eth0 mtu 9000

I have used FC7 and FC10 fedora OS because its' not possible to compile all the targets on same kernel.

 

9.2 Techniques to measure the disk i/o

After doing successful setup of all the available AoE targets, now its time to measure the disk i/o of exported block device. The available options to measure the i/o of block devices are hdparam, iostat, dd and some i/o tools like fio, bonnie, iozone, iometer etc.

 

9.3 Performance Graph with Fio

Here we have used fio (the i/o measurement tool) and plotted the performance graph. I have kept following configuration in surface-scan file:

[global]
ethread=1
bs=128k
direct=1
ioengine=sync
verify=meta
verify pattern=0xaa555aa5
verify interval=512
[write-phase]
filename=/dev/etherd/e0.1 ; or use a full disk, for example /dev/sda
rw=write
fill device=1
do verify=0
[verify-phase]
stonewall
create serialize=0
filename=/dev/etherd/e0.1
rw=read
do verify=1
bs=128k

After setting the appropriate parameter in this file, just run the following com- mand against all the available aoe targets to collect the required data for fol- lowing graph:

[[email protected] ]# fio surface-scan

The performance graph of AoE targets with fio in case of write operation is as follows:

Here X-axis denotes block-size in kilo-bytes and Y-axis denotes throughput in KB/sec (kilo-byte per second).

The performance graph of AoE targets with fio in case of read operation is as follows:

 

9.4 Performance Graph with Bonnie-64

We can setup bonnie-64 by following commands:

[[email protected] bonnie-64-read-only]# mount /dev/etherd/e0.0 /mnt
[[email protected] bonnie-64-read-only]#./Bonnie -d /mnt/ -s 128

The performance graph of AoE targets with bonnie in case of write operation is as follows:

Here X-axis denotes file-size in mega-bytes and Y-axis denotes throughput in M/sec.

 

10 AoE Tools & commands

There are some tools and commands available to analyze aoe-traffic on the net- work.Run the following commands to download and install these tools:

[[email protected] kk]# wget http://support.coraid.com/support/linux/aoe6-70.tar.gz
[[email protected] kk]# tar -xzvf aoe/root/Documents/Documents/diagram2.eps6- 70.tar.gz
[[email protected] kk]# cd aoe6-70
[[email protected] aoe6-70]# make
[[email protected] aoe6-70]# make install

Now you have installed the necessary aoe-tools and you are able to use the following commands:

  • aoecfg - manipulate AoE configuration strings
  • aoe-discover - tell aoe driver to discover AoE devices
  • aoe-flush- flush the down devices out of the aoe driver
  • aoe-interfaces - restrict aoe driver to specified network interfaces
  • aoe-interfaces - restrict aoe driver to specified network interfaces
  • aoe-mkdevs - create special device files for aoe driver
  • aoe-mkshelf - create special device files for one shelf address
  • aoeping - simple communication with AoE device
  • aoe-revalidate - revalidate the disk size of an aoe device
  • aoe-stat - print aoe device status report
  • aoe-version - print AoE-related software version information

 

11 Making of a Small SAN

Till now, we have seen the available targets, corresponding features and their performances. Now its time to make a SAN based on the available disks and Gigabyte ethereal switch. So, following are the steps to make a SAN.Well, while writing this article, I don't have the extra hard disks. So, I have performed my experiment on 200 MB loop devices. In actual setup these loop devices will be replaced by actual 200 GB hard disks. The basic diagram of our SAN is as follows:

 

11.1 Server Setup For SAN

Export two devices from server0

[[email protected]]# dd if=/dev/zero of=file1.img bs=1M count=200
[[email protected]]# dd if=/dev/zero of=file2.img bs=1M count=200
[[email protected]]# losetup /dev/loop0 file1.img
[[email protected]]# losetup /dev/loop1 file2.img
[[email protected] vblade-19]# losetup -a
[[email protected] vblade-19]# ./vbladed 0 0 eth0 /dev/loop0
[[email protected] vblade-19]# ./vbladed 1 0 eth0 /dev/loop1

 

Export two devices from server1

[[email protected]]# dd if=/dev/zero of=file1.img bs=1M count=200
[[email protected]]# dd if=/dev/zero of=file2.img bs=1M count=200
[[email protected]]# losetup /dev/loop0 file1.img
[[email protected]]# losetup /dev/loop1 file2.img
[[email protected] vblade-19]# losetup -a
[[email protected] vblade-19]# ./vbladed 0 1 eth0 /dev/loop0
[[email protected] vblade-19]# ./vbladed 1 1 eth0 /dev/loop1

11.2 Client Setup For SAN

Exported block-devices at Client:

Make sure that at client side you have latest AoE driver and corresponding tools installed. You can check it out by following command:

[[email protected] krishna]# lsmod j grep aoe

If you don't have aoe driver at your box, you can download it from following mirror: http://support.coraid.com/support/linux/

[[email protected] krishna]# aoe-version

aoetools: 29
installed aoe driver: 70
running aoe driver: 70

Now run following commands to access the exported block device at client:

[[email protected] krishna]# modprobe aoe
[[email protected] krishna]# aoe-stat e0.0 0.209GB eth0 1024 up

e0.1 0.209GB eth0 1024 up
e1.0 0.209GB eth0 1024 up
e1.1 0.209GB eth0 1024 up

So, you can see the exported block devices on your box.

 

Creation of Raid Array:

Mirroring from e0.0 and e0.1:

[[email protected] krishna]# mdadm -C /dev/md0 -l 1 -n 2 /dev/etherd/e0.0 /dev/etherd/e0.1

Mirroring from e1.0 and e1.1:

[[email protected] krishna]# mdadm -C /dev/md1 -l 1 -n 2 /dev/etherd/e1.0 /dev/etherd/e1.1

Create the stripe over the mirrors:

[[email protected] krishna]# mdadm -C /dev/md2 -l 0 -n 2 /dev/md0 /dev/md1

So, now we have following configuration of Raid Array:

[[email protected] krishna]# cat /proc/mdstat

Personalities : [raid1] [raid0]
md2 : active raid0 md1[1] md0[0]
409344 blocks 64k chunks
md1 : active raid1 etherd/e1.1[1] etherd/e1.0[0]
204736 blocks [2/2] [UU]
md0 : active raid1 etherd/e0.1[1] etherd/e0.0[0]
204736 blocks [2/2] [UU]
unused devices:


Convert RAID md2 into an LVM

Convert the RAID into an LVM physical volume:

[[email protected] krishna]# pvcreate /dev/md2

Create an extendible LVM volume group:

[[email protected] krishna]# vgcreate volgrp /dev/md2
[[email protected] krishna]# pvs

PV VG Fmt Attr PSize PFree
/dev/md2 volgrp lvm2 a- 396.00M 396.00M
[[email protected] krishna]# vgs
VG #PV #LV #SN Attr VSize VFree
volgrp 1 0 0 wz{n- 396.00M 396.00M

Create a logical volume using all the space:

[[email protected] aoedoc]# lvcreate -L 300M -n logicalvol volgrp
[[email protected] aoedoc]# lvs

LV VG Attr LSize
logicalvol volgrp -wi-a- 300.00M

So, finally we have created our logical volume having 300 MB size.

Create a filesystem:

[[email protected] aoedoc]# mkfs.ext3 /dev/volgrp/logicalvol
[[email protected] aoedoc]# mount -t ext3 /dev/volgrp/logicalvol /mnt/
[[email protected] aoedoc]# cd /mnt/
[[email protected] mnt]# mkdir aoe
[[email protected] mnt]# touch aoefile
[[email protected] mnt]# ls

aoe aoefile lost+found

So, finally your SAN is ready. If you want to resize your volgroup or want to add some more disks then first unmount it and use vgextend, resize2fs for it.

Share this page:

6 Comment(s)

Add comment

Comments

From: krishna

I am confused by the network raid layout and why you would do it in such a way.  The use of Raid0 for performance is completely negated by having a mirrored and a stripped volume on the same system due to network bottlenecks, and this is assuming that we replace the loop files with physical disks.  Would it not be better to simply raid0 the disks on each fileserver and the do a raid1 mirror on the client?

>>> There is nothing to confuse. I have just given the outlines. You can use whatever RAID and LVM combination you want. Its upto you what you want to acheive.

 With some forethought, this can easily provide a mechanism to replicate the entire SAN to a second unit.  Simply export the original raid device via DRDB and have the second machine a member of the DRDB group.  Again, bond some NIC cards and possible direct connect the two boxes if they live together to keep that traffic off the network.

>>> ya ofcourse, DRBD is network over RAID1. and in my next article you will get how I have used it in my HA SAN.

 

Best Regards,

Krishna

From:

opensolaris b129 ZFS with deduplication, CIFS, and iSCSI.

put a box together with your desired number of disks.

decide on raid5 or raid6, raidz~=raid5 and raidz2~=raid6

$zpool create san raidz2 disk1 disk2 disk3 etc etc etc #zfs create -V (size)g san/volume1

$zfs set shareiscsi=on san/volume1

$zfs set dedup=on san/volume1

$iscsitadm list target Target: san/volume1 iSCSI Name: iqn.##:blahblah

If you really want to use AoE instead of iSCSI then install the CORDaoe pkg and read the man pages.

Im not trying to make this a deduplication thing but if you are investing in a homebrew SAN device, just get 8GB of ram and a quad core and do dedup simply for the performance gain in writes. The dedupe mechanism is in-line so every block written will be deduped if possible and a dedupe operation on a 20MB file could make it seem like its just a 20KB file.

important things to get for a SAN:

get a SAS card with cache and a battery backup for the cache. redundant power supplies for the server. UPS device dedicated to the SAN.

I prefer to do the raiding in software because I dont have a worry about losing a controller. If the controller or even the whole server crashes I can migrate the drives to something else and import them with linux software raid or zfs.

I also like ZFS as it has matured and is pretty reliable now and offers deduplication, compression, ability to make multiple copies of files within a zfs volume in addition to rai, as well as ability to put 1 higher performance drive in the system (like an SSD) for caching writes which can improve performance for small files significantly.  It can also do online, nearly instant snapshots, which are read only mountable and exportable for backups.

 

From: Krishna Kumar

Hi ,

Using ZFS is a good suggestion, I have also thought about it. But I don't think we can use it directly due to some open source licensing issues. Fuse can be one alternative against open source for using it.

Second

When we are talking about SAN, we need a cluster aware reliable file system, here GFS and butrfs seems a very good alternative and it comes with Linux kernel. So we can use it.

 

Third

I have written this article for AoE, If you want to go for iSCSI, you can go for it.

 

Best Regards,

Krishna

 

From:

I am confused by the network raid layout and why you would do it in such a way.  The use of Raid0 for performance is completely negated by having a mirrored and a stripped volume on the same system due to network bottlenecks, and this is assuming that we replace the loop files with physical disks.  Would it not be better to simply raid0 the disks on each fileserver and the do a raid1 mirror on the client?

 Server1:

/dev/md0 contains a raid0 of disk1 and disk2

exports shelf0 slot0 aka e0.0

Server2:

/dev/md0 contains a raid0 of disk3 and disk4

exports shelf1 slot0 aka e1.0

 also consider that the concept of 'shelf' is to describe a site or home of the disk so simply for organization sake all disks in a single server should be on the same shelf.

 

Client1:

/dev/md0 is a raid1 of /dev/etherd/e0.0 and /dev/etherd/e1.0

 

This would be faster simply because each server would not have to have both a stripe and a mirror.

 

Though I make this suggestion, I would not recommend a client-side raid setup on a SAN in the first place.  It would be much better to have a single SAN box with a redundant copy via DRDB

Server1:

many disks in a RAID array.  Partition those disks with LVM and export the LVs via vblade.  Then you can have more disk in a raid5,6, or 10.  you can do switch-level NIC bonding to get multiple Gigabit channels out to the switch (and do the same on the client if necessary)

 

With some forethought, this can easily provide a mechanism to replicate the entire SAN to a second unit.  Simply export the original raid device via DRDB and have the second machine a member of the DRDB group.  Again, bond some NIC cards and possible direct connect the two boxes if they live together to keep that traffic off the network.

 in my next post I will outline what I have done:

From: balaji

thank you for the article. Nice one.

 As of now, i am trying to learn SAN technology.

Could you please give some URLs/links which explains san concepts and configuration deeply.

I would be more happy if you provide some information about SAN.

Thanks.

 

From: