First of all: what is StorageGRID Webscale? It is a software-defined object-based storage platform from NetApp. It was born about 2000 formaly known as Bycast StorageGRID – in 2010, NetApp bought this technology and give it its new name. It can be accessed via Amazon S3 or OpenStack Swift protocols. With “NAS Bridge” you could also use NFS and CIFS.
A few months ago we had a PoC with five NetApp SG5712 Appliances (each 12x 10 TB SATA drive) and build a 5-Node StorageGRID Webscale Cluster. This cluster had about 400 TB usable space. For client access, each node was connected with 2x 10 GbE LACP to our network switch. For management you need to deploy a Admin Node – in our case that was a virtual machine on our VMware vSphere cluster.
A simple StorageGRID Webscale environment could look like this:
In this example you have got five Storage Nodes, one Gateway Node and an Admin Node – each node has an IP in the GRID Network. You can separate the Client- and Admin-Traffic in there own subnets, but you don’t need to. For client access you can connect directly to one of the Storage Nodes – but for performance reasons you should loadbalance over all Storage Nodes. This could be done via a HTTPS loadbalancer or for smaller environments you can deploy one or more Gateway Nodes. These Gateway Node are virtual machines which are connected to the GRID Network and know all Storage Node over the GRID configuration – there is no need of manual configuration for this loadbalancer. If you deploy multiple Gateway Nodes you can build a very simple but also fast environment with DNS roundrobin over the Gateway Nodes. Every client access should use the DNS address of this gateway then. The Admin Node is – as the name says – for administration. You monitor your GRID, install updates, add new nodes or create tenants (S3 consumer) there – it is the single point of login for your whole StorageGRID Webscale.
In default configuration, your data hasn’t any redundancy – but you can choose between replication and erasure coding. Protecting your objects via replication means, that StorageGRID will store your object more than one time on different Storage Nodes – if needed also on different locations. Protection with erasure coding means, that StorageGRID will split all objects into multiple data and parity fragments. Both methods will keep your data if one Storage Node is down or has broken disks.
For performance benchmarks we have used S3Tester on a Debian Linux VM with 64 vCPU, 16 GB RAM and 10 GbE. Every test has been done with 128 concurrency.
10 kB, 20.000 Files
Total number of requests: 19968 Total number of unique objects: 19968 Total elapsed time: 6.877714341s Average request time: 39.965068ms Standard deviation: 27.022912ms Minimum request time: 8.897601ms Maximum request time: 791.100072ms Nominal requests/s: 3202.8 Actual requests/s: 2903.3 Content throughput: 28.352442 MB/s
10 MB, 2.000 Files
Total number of requests: 1920 Total number of unique objects: 1920 Total elapsed time: 25.605430224s Average request time: 1.520233645s Standard deviation: 807.182087ms Minimum request time: 343.246636ms Maximum request time: 5.325922839s Nominal requests/s: 84.2 Actual requests/s: 75.0 Content throughput: 749.840945 MB/s
100 MB, 1.000 Files
Total number of requests: 896 Total number of unique objects: 896 Total elapsed time: 1m52.380382266s Average request time: 14.434036624s Standard deviation: 3.131988067s Minimum request time: 4.553443282s Maximum request time: 24.256991903s Nominal requests/s: 8.9 Actual requests/s: 8.0 Content throughput: 797.292180 MB/s
The bottleneck in the last both tests was the 10 GbE performance of the Test VM. If you split this test on multiple VMs, there should have been much more performance.
NetApp StorageGRID Webscale is a nice new technology to store a massive count of objects. It almost scales unlimited. You don’t have to keep your storage limit in mind, just add some more Storage Nodes. Also think about the FabricPool feature and the fact that FAS/AFF Machines aren’t able to provide the S3 Protocol, StorageGRID Webscale is closing the gap in the NetApp Data Fabric perfectly. You can choose between Hardware Appliances, Docker Container oder VMware vSphere VMs. You can span your GRID over multiple racks, datacenters, cities or even continents.
This was just a very short first look at StorageGRID Webscale. It can do much more nice things … 😉