An assortment of indigestible things

Disk space on the cheap: my experiences with alternative storage vendors

It took me a long time to think of an appropriate title for this post. I didn’t want to upset anyone by calling their products ‘consumer grade’ when that’s not how they’re marketed, and I couldn’t think of a way to say ‘disk arrays that aren’t made by Oracle, HP, IBM or NetApp’ that didn’t somehow denigrate the competition. I toyed with ‘sub-prime’ but that has other unfortunate connotations 😛

Working as I do for a big important library, there is much call for storage of large archival data sets that need to be available all the time, but don’t require blistering performance. There’s a bit of a gap in the market here in terms of the usual vendors of enterprise storage: they concentrate on squillions of IOPS and cable-melting transfer rates, where what I actually want is a shedload of disk space, in a fairly small number of rack U, for a reasonable wedge of cash. If users of our niche websites need to wait ten seconds for their data, it’s not the end of the world. It’s also nice to have storage with different architectures and filesystems, so that I can mirror data without worrying about some obscure FS bug screwing up my data at both ends.

I have therefore bought a few disk arrays from companies that you wouldn’t normally associate with enterprise storage (there, good enough?). Having had the opportunity to familiarise myself with them in production, here are a few thoughts. I wouldn’t go so far as to call them reviews, as I don’t really talk much about performance… which is actually quite lucky, because neither of the following units provides any performance reporting capability.

Thecus N16000

Thecus has something of an identity crisis. Its website says it is ‘Creator in Storage’ (probably a bad translation—the first of many, as we will see). The front panel display says ‘Storage Leader Thecus’ (ditto). But I nit-pick. Suffice it to say that, before all this, I only knew Thecus as the manufacturer of small NAS boxes for the homes of geeks.

I have four N16000s, which is Thecus’s top-of-the-line bit of enterprise kit. My units have sixteen 3TB disks each, which is a fair old wedge of storage considering how cheap they were, and a pretty good density as each unit is only 3U high. They have a pair of PSUs, a smattering of eSATA and USB ports, VMware certification, and frankly they look the mutt’s nuts.

Build quality and installation

Having said all that, the build quality isn’t brilliant. The units ship covered in sticky plastic, presumably to protect the unpainted metal case, which is pretty thin and flexible. The drives themselves, which in my units were packed separately, are in caddies with handles that feel like they would snap if pushed too hard. To get access to the drive bays, you have to undo two thumbscrews and the front panel hinges downwards: adequate, just not very slick, and the ribbon connector that services the front panel display looks awfully vulnerable. The racking kit is a third party set of sliding rails which are perfectly adequate, and supports the unit nicely even when fully extended. From the back, the N16000 looks like a rack-mount PC.

I dispensed with the configuration software and set IP addresses from the front panel. This is a massive, massive plus in Thecus’s favour! It meant that I could get the N16000s on the network without farting about with laptops and crossover cables; initial configuration took less than two minutes per unit.

Unfortunately, one of the units immediately reported a fan failure (thank you again front panel display!). This meant that I had to take the lid off the thing. There was no indication on the unit as to which panel had to be removed (this is something Oracle does very well: their units are literally covered in diagrams and instructions), so I guessed at the biggest one. Removing the screws was difficult, as they are very small and made of crappy soft metal, so I had to be very careful not to damage them. I don’t think it would take many cycles of lid-lifting before a screw head got stripped. Having removed the lid, I immediately saw the problem: one of the four fans had shifted in transit, and its edge connector wasn’t seated correctly. It alarmed me just how easy it was to re-seat: the connector offers almost no resistance, to the point where I thought it must have been broken. I made a mental note that, if I ever need to move these things, I may well have to pop the lid to sort out the fans.

Initial setup

The N16000s are configurable with a browser, so no software to install, which is definitely a good thing. Unfortunately my heart sank when I first logged in to an N16000.

When you’ve worked in IT for a while, you get a feel for what a bit of kit is going to be like pretty quickly. The big friendly icons coupled with ‘iTunes Server’ told me that this is not enterprise-grade stuff. What kind of enterprise wants a 48TB iTunes server?! I asked it to give me a big fat RAID 5 with all the disks bar one that I configured as a spare, and set about setting up NFS and then iSCSI so that I could do some basic benchmarks (although performance isn’t my priority, there’s no point shooting myself in the foot). For some reason it made me select a filesystem when configuring the RAID, which makes no sense if I’m then going to use iSCSI, but whatever…

This is when I discovered that the web interface is—sorry Thecus—crap. If you ask it to do a long operation, like setting up an iSCSI LUN, it will display ‘please wait’ and go away. For a long, long time. Sometimes forever, so that you have to do a reset from the front panel. Even selecting ‘Disk information’ takes a good 40 seconds to come back. What’s more, the messages are so badly translated that it’s occasionally difficult to know exactly what is happening; I particularly dislike status messages that end in an exclamation mark, because it makes it look like the developer was genuinely surprised to see it work correctly!

Warning: The feature is not available due to Samba protocol has disabled

Sadly, the manual is just as badly translated:

Please be sure that ‘Standby’ unit volume size must be larger then ‘Active’ unit. Or the HA synchronize will result failed.

Production and support

I hit a problem almost immediately: I couldn’t mount the thing from an NFS client. No matter what version and options I specified, all I got back was ‘Stale NFS file handle’. I logged a ticket with Thecus expecting a pretty quick response as this effectively meant that all four arrays were unusable… but it took three days for them to ask for configuration information, 12 days to reproduce the problem, and a full 25 days to give me a patched firmware revision (1.00.10.3). Note that this wasn’t release firmware (and as of September 2011 still isn’t), and although it fixed the problem, I’m reticent to press these units into production until I get a firmware release that works for me.

(Did I mention that some Javascript on Thecus’s support website hangs Firefox on Ubuntu when you try to log a ticket? No? Grrrrr.)

I now have another problem relating to SNMP: the values returned bear no resemblance to the disk space actually available or used. My ticket speaks for itself:

I’m trying to get the size and utilisaton of the partition on one of my N16000s, but the values returned by SNMP seem to be wrong.

In the GUI, the RAID size is reported as 39088GB, with 54.2GB used. However, when I query the device with SNMP, I get two possible partitions: /raidsys/0 and /raid0. The values reported are:

/raidsys/0:
Size: 129869 * 4096 bytes (about 531MB)
Used: 6604 * 4096 bytes (about 27MB)

/raid0:
Size: 1656220512 * 4096 bytes (about 6.2TB)
Used: 14199338 * 4096 bytes (about 58GB — this is about right)

This time, the bug was confirmed within two days and passed to R&D, but then the ticket was closed without notifying me! I re-opened it when I noticed, but I’m still waiting for a resolution.

(Sorry to get a bit ranty about customer service, but this is a real pet peeve of mine: the problem is not resolved until the customer says so, or you have exhausted all options. Grrrr again.)

Conclusions

In my opinion, the Thecus N16000 is not ready for production. If Thecus was serious about enterprise-grade kit, it would have a long and hard look at its Q&A department, as the issues I’ve uncovered in a very short time should have been picked up long before the units arrived in my server room. I’ve no doubt that, given time, the firmware will mature into a good product, but we as IT folk place a lot of faith in our storage equipment, and that faith is quickly eroded. Browsing through the inaptly named ‘I Love Thecus’ forums, I see that I am not alone.

Drobo B800i

Drobo (formerly Data Robotics) makes really lovely things. I know I should be writing about performance, support and compatibility, but Drobos are shiny. I stumbled across them at Infosec of all places, and was struck by their design, all curved edges and shiny plastic. They showed me how you can pop disks out and replace them with larger units without any configuration, and without interrupting service. As we discussed pricing, the front panel glinted in the harsh light of Earls Court. Soon afterwards I ordered two Drobo B800is with 8x3TB disks each.

Build quality and installation

In stark contrast to the Thecus units, the Drobos look and feel solid. The thick metal case is powder-coated, and while it does bow slightly outwards at the top, it certainly doesn’t feel like it would bend under pressure. The front panel is black and mysterious, and clips on firmly with magnets (which I can’t help but wonder about, being someone who used to scrawl ‘MAGNETIC MEDIA: DO NOT X-RAY’ on envelopes, but I’m sure someone far cleverer than me has thought about the wisdom of having magnets so close to disks). Pulling off the panel gives access to the eight drive bays, which take bare SATA drives. I really like this, as I can buy commodity disks and slot them in without messing around with caddies and tiny screws. The disks push past large spring-loaded tabs which make them clunk into place; the whole arrangement feels very solid, with no ‘play’ in the disks at all once seated. The tabs also serve as the disk status lights. As a cute little feature, there is a bar-graph of blue lights at the right-hand side which indicates how much disk space has been used.

Sadly, the whole affair is let down by the racking kit (sold separately). It’s just a piece of power-coated steel that forms a tray under the unit with tabs at the front for bolting to the front rack rails. Unfortunately the thing simply isn’t strong enough, and even when firmly bolted in, the unit sags at the back. The sag is so bad that I had to leave a spare U underneath, because the equipment below it scraped along the Drobo when it slid on its rails. A B800i full of disks is surprisingly heavy, and I think what it really needs is a proper rail kit with fixings front and back. The racking kit really feels like an afterthought; if I ever have to move them, I may end up putting them on shelves.

Initial setup

To be fair, I knew before I bought them that Drobos use custom software for setup and configuration—not ideal, but they were very cheap. The Windows-only software is, like the units themselves, very pretty indeed, and initial impressions are of a very polished product. It even displays a little picture of the unit, showing the current state of all the lights.

To be fair, there is a lot less to configure than the Thecus units, mainly because the B800i is not particularly feature-rich: it’s iSCSI or iSCSI. There is a hard limit of 16TB on volume size—which is irritating and feels arbitrary—and various settings to do with the level of redundancy available. Drobos call their system ‘Beyond RAID’; I have no idea what they mean by this, and the only configurable option is to choose to be able to survive one or two disk failures. If they’re aiming the units at the enterprise, I think more detail is essential rather than hiding behind buzzwords.

The dearth of configurability is thrown into sharp relief when it comes to iSCSI itself. I was preparing to copy and paste the clients’ initiator names, when I found that there is nowhere for them to go. Now I’m quite prepared to be corrected on this, but as far as I can see there is no way of restricting client access! My units reside on a private storage network, but I still like to lock things down as best I can, and it’s a bit of a bum-clencher when I know that any of my clients can connect to the things. Luckily the data they’ll hold isn’t at all sensitive (or interesting ;)), but do bear this in mind if you’re putting them into an environment of dubious trustworthiness.

Production and support

The good news is that I have far less to complain about when it comes to confidence-sapping problems. The units respond exactly as I would expect them to, and the GUI is—as far as it goes, anyway—pretty good. The only hitch is that, unlike the Thecus units which can send emails and speak SNMP (incorrect figures notwithstanding), the Drobos only talk to the GUI. This means that I now have a Windows VM that’s dedicated to monitoring the Drobos. The GUI can be configured to send emails when things go wrong, but it has to be running all the time, so I also have to monitor the VM to make sure that the GUI is running! This is all a bit of a faff, and far from ideal in an enterprise environment where I just want to slot stuff in and have it work. It does work, once all this stuff is sorted out, but I could have done without the extra complexity.

I’ve only contacted Drobo once, and was just as underwhelmed as with Thecus. The GUI uses active FTP to check for updates, which is unbelievably stupid: as we all know, active FTP is a firewalling nightmare. Of course the GUI has no way of using HTTP through a web proxy, which is the right way to do it, so I logged a call with Drobo. I wasn’t expecting much, as this was more an enhancement request than anything else, but I effectively just got ‘sorry, not supported’. I had to re-open the call to ask for this to be passed to R&D, which it apparently was. Grrrr again. I should not have had to do that.

Conclusions

Shininess aside, the Drobo B800i is more ready for production than the Thecus, but only slightly. I feel confident enough to use it for actual live data, but the lack of basic features like iSCSI access control would make me think twice before buying another one. At least my Windows VM workaround only needs to be done once, should I expand my Drobo collection.

Final thoughts

How do I feel after all this?

Tired 🙂

I don’t regret buying all this stuff, as I now have a huge amount of disk space that cost a relatively small amount of money. I think my problem is that I expect far too much of people: when a product is released to the market, I expect it to be tested to within an inch of its life. I expect the documentation to be proof-read and correct in every detail. I expect regular patches as problems are discovered and fixed. With the Thecus, I really felt like a beta-tester. The Drobo was clearly better, but badly thought-out as an enterprise product with critical features missing.

Update: here’s a great article comparing Thecus with units by Iomega, Netgear, QNAP and Synology.

Previous

Security through obscurity is a good thing!

Next

Debug like a sysadmin: using strace and ltrace

2 Comments

  1. Dave

    Hi!

    Good article, I have been using Thecus, QNAP and Synology for cheap storage for several years. We use the systems mostly for non-production use (sysadmin storage, secondary archiving etc)
    Last year we purchased a 8 drive QNAP to use as a cheap storage unit for VMware server in one of our offices. The first thing I learnt was that not to use iSCSI, preferrably ever. We use NFS and so far it has worked fine.
    We were planing on using it for production even though the servers it would handle are not vital and can be easily replaced.
    We had problems with the ACL’s when not using advanced file mode, some servers just could not use the shares from the QNAP as they could on a windows server or other vendor NAS (Thecus, NetApp).
    I enabled advanced file mode ACL’s and well I seem to get them to work but they are horrible to work with. This has so far worked fine on the Thecus boxes we have.

    We have from several years back another Thecus box, this is a Thecus i4500r, a simple iSCSI box that has worked great for the past 4 years without a hitch. So I have to give Thecus some props for this.

    As far as I can see I will never buy a Thecus again, bad support, NO SSH access by default (this is so moronic and gets me so pissed off I just don’t want to think about it), and did I say the rails from Thecus is a joke, worthless crap.

    I knew from the forums that support was bad but I have actually downgraded support to terrible on Thecus and QNAP, Synology seem to get the idea regarding support and business customers a bit better.

    My best experience with the three has been with Synology, they might not have had all the “features” the other two have but have performed very well.
    They have performed so well that we are thinking of using them for our coming secondary storage archive, if we can get them to work well with replication to a secondary system. They are also expandable without being to pricey.

    My rating of the 3 would be
    1. Synology
    2. QNAP
    3. Thecus

    My best suggestion to anyone still getting a larger setup from any of the 3 above is to buy at least one extra of the hard drive you get for the system primarily. For the larger ones get 2-3 drives since they’re cheap. It will save you time and aggravation.

    Cheers!

    /Dave

    • flup

      Thanks for your detailed comment. I’m inclined to agree to some extent, although I am still a proponent of iSCSI. I’m going to be implementing SANSymphony-V shortly for cross-site redundancy and that requires an iSCSI storage layer underneath.

      I certainly wouldn’t choose Thecus again, mainly because their support is just appalling. I’m treated like a home user even though I bought their top-end enterprise product. I’ve had an active call about a RAID rebuild failure for about three weeks with no response at all (in the end I just fixed the problem myself; I couldn’t wait any longer).

      I think the main lesson I’ve learned is that you can’t get around paying for decent support.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress & Theme by Anders Norén