January 31, 2020 at 10:47PM
Last week I attended Storage Field Day 19. As it always happens at these events there are trends that you can easily spot by following the presentations and then connecting the dots. In my opinion, no matter what a single vendor says, sustainable data storage infrastructures are made of different tiers and you need a smart mechanism to move data across different levels seamlessly.
Why tiers?
There isn’t a lot to say here. The storage industry now provides numerous types of media and it is quite impossible to build your storage infrastructure on a single one.
From storage-class memory, or even DRAM, down to tapes, every storage tier has its reason to exist. Sometimes it is about speed or capacity, in other cases it is about a good balance between the two. In the end, it is always about cost. In fact, no matter how scalable your storage product, it is highly unlikely that you will be able to do everything with just flash memory, nor disks only.
Even when we envision the all-flash data center, the reality is we will have multiple tiers. Multiple types of flash locally, and the cloud for long-term cloud data retention. The all-flash data center is just a utopia from this point of view. Not that it wouldn’t be possible to realize, but it is just too expensive. And it is no news that cold data in the cloud is stored in slow disk systems or tapes Yes, tapes.
Again, we are producing more data than ever, and all predictions for the next few years are about further acceleration. The only way to sustain capacity and performance requirements is to work intelligently on how, when and where to place data correctly. Finding the right combination of performance and cost is not difficult, especially with the analytics tools available from the storage system itself.
Last year I wrote two reports for GigaOm on these exact topics (here and here), and I’m already working on a new research project about cloud file systems that will start from a similar premise.
How, When & Where
I want to work by examples here, and I’ll borrow some of the content from SFD19 to do that.
In small enterprise organizations, the combination of flash and cloud is becoming very common. All-flash on-prem, and cloud for the rest of your data. The reason is very simple to find, SSDs are big enough and cheap enough to keep all active data online. In fact, when you buy a new server, it is highly likely that flash memory is the first and probably the only option to build a balanced system. Then, because of the nature of this type of organization, it is very probable that the cloud is absorbing most of the data produced by this organization. Backups, file services, collaboration tools, whatever, they are all migrating to the cloud now, and hybrid solutions are more common than ever.
Tiger Technology has a solution, a filter driver for Windows servers, that does the trick. It’s simple, seamless and smart. This software component intercepts all activity in your servers and places data where it is needed, finding the best compromise between performance, capacity, and cost. At the end of the day, it is a very simple cost-effective, efficient, easy to manage solution, totally transparent for the end-users. Use cases presented during the demo include video surveillance, for which several concurrent streams need a lot of throughput, but data is hardly accessed again after written, and moving it quickly to the cloud ensures a low cost with good retention policy.
The same goes for the large enterprise. Komprise, a startup that applies a similar concept to large scale infrastructures made of different storage systems and servers. The result is similar though, in a matter of hours Komprise begins to move data to object stores and the cloud, freeing precious space on primary storage systems while creating a balanced system that takes into account access speed, capacity, and cost. By analyzing the entire data domain of the enterprise, Komprise can do much more than just optimize data placement but this is a discussion for another post. In this case, we are talking about the easy-to-grab low hanging fruit that comes with the adoption of this type of solution. Check out their demo at SFD19 to get an idea of the potential.
And one more example comes from a company that primarily works with high-performance workloads: Weka. These guys developed a file system that performs incredibly well for HPC, AI, Big Data, and every other workload that really needs speed and scale. To bring this kind of performance they designed it around the latest flash technology. But even if the file system can scale up to incredible numbers, the file system can leverage object storage in the back-end to store unused blocks. Again, it is a brilliant mechanism to associate performance with capacity and to bring a good overall infrastructure cost to the table without sacrificing usability. The demo is eye-opening about the performance capabilities of the product, but it is the presentation of one of the latest case histories that gives a complete picture of the real possibilities in the real world.
And There Is More
I’m planning to write separately about Western Digital and some of the great stuff I saw during their presentation, but in this post, I’d like to point out a couple of facts around multi-tier storage.
Western Digital, one of the market leaders in both flash and hard disk drive technology, didn’t stop developing hard drives. Actually, it is quite the contrary. The capacity of these devices will grow in the following years, with larger capacities and a series of mechanisms to optimize data placement.
WD is a strong believer in SMR (Shingled Magnetic Recording) and zoned storage. These two technologies together are quite interesting in my opinion and will allow users to further optimize data placement in large scale infrastructures.
It is always important to look at what companies like WD have in mind and are developing to get an idea of what is going to happen in the next few years; and it is clear that we will see some interesting things happening around the integration of different storage tiers (more on this soon).
Takeaways
To build a sustainable storage infrastructure that provides performance, capacity, and scalability at a reasonable price, storage tiering is the way to go.
Modern, and automated, tiering mechanisms offer much more than optimized data placement. They constantly analyze data and workloads and they can quickly become a key component of a powerful data management tool (look at Komprise for example).
Because of the growing scale of storage infrastructures and the way we consume data in hybrid cloud environments, data management (including automatic tiering) and storage automation are now much more important than a single storage system to keep real control on data and costs. Here is another GigaOm report on unstructured data management offering a clearer idea on how to face this kind of challenge.
Stay tuned for more…