Indexing is the missing link to ILM
Posted in General, Storage Applications, Storage Management, Advisor - Joe Disher by Joe DisherMost IT Managers have had to start thinking about data catagorization and indexing services… and for anyone that’s paid attention to the ever-confusing 3-letter acronyms that describe data management and archival in storage (CDP, SRM, ILM, etc…) this all probably seems like more information that is targeted to confuse the masses on how to deal with their growing storage problems.
ILM or Information Lifecycle Management has promised for a long time now to give control of a companies vital digital assets back to an organization by providing policies and automated data movement services. Most of the ILM products out there key off of data access patterns (last modified/accessed dates) to decide when it’s appropriate to migrate data to secondary storage or to some WORM device for permanent archival. Further intelligence must be provided by the IT administrators and the owners of the data to make sure that data is stored appropriately to begin with. What a pain for busy administrators and end users that don’t see the benefit or have the time to really think through how their data needs to be stored! And let’s not forget that just because it makes sense to store the corporate financials on “Server A” today, doesn’t mean that it shouldn’t be on “Server B” tomorrow. Companies data and IT infrastructures continually evolve, thus continually changing storage management needs.
Enter “Data Indexing Services” or “DIS” - (just made that up! kinda like it… and not just because it’s the 1st 3 letters of my last name or because I’ve been in the storage industry for so long that I have an uncontrollable need to apply a 3-letter acronym to everything I talk about. It’s true - just ask the “WIC” - that’s my wife, the “Woman In Charge”.) Sorry! Distracted by acronyms again!
Anyway - Data Indexing Services manifest themselves in a number of different ways, but the main benefit is automated indexing and data catagorization. In short, indexing applications automatically create “data tags” that can ease the pain of finding data in the 100’s of places we all store our stuff. Imagine now with all of our email, “My Documents” data, and every file server we have in our companies having real world useful “color codes” to help us find what we’re looking for. Now extend that to the difficult promise of ILM to “intelligently” manage the lifecycle of data based on policy. We all know that just because a file hasn’t been accessed for 2 years that doesn’t mean it should automatically be archived offsite. It may need to stay put on primary storage indefinately with replications in multiple places - even if it’s never accessed.
Let’s now take some of the automatic data catagorization from “DIS” and use those catagories to apply policy to the lifecycle of data. Now administrators can set policy based on something more then access and modify dates!
If some of the new indexing applications live up to the hype, and lifecycle mangement applications take advantage of these new “data tags” without overburdening IT or the end users, ILM may actually have a chance of making it to the main stream. Maybe not… but we can all dream, right!
Blog ya later!
Joe
November 30th, 2006 at 2:41 pm
This looks very promising.
Do you know any COTS (Commercial Off The Shelf) products or vendor names that are available today?
How scalable vertically and horizontally is the “DIS” solution?
Will it scale from the SOHO to the Enterprise? Enterprise only?
Will it scale from Desktops to HPC clusters? HPC only?
How is this different from a metadata solution like http://www.njini.com/
or CAS (Content Addressed Storage) solutions
or “tagging” solutions?
In the way it is Implemented?
For example, how does the “DIS” tell an application index file from an application data file?
I don’t expect an in-depth answer. I can dig that out for myself. Just a quick go-bye. I know, I could dig that out for myself too.
December 6th, 2006 at 9:00 am
Robert - Thanks for the questions.
Scalability really depends on the application. There are many COTS products out there for data indexing of unstructured data with varying scalability limits. Both Microsoft and Google have free offerings for the desktop. Google even has an Enterprise appliance offering. Solutions that are more business-centric come from Documentum and many others. I hadn’t heard of Njini before, but they seem to be focused on tailoring their software around business requirements also.
CAS is a complimentary technology that has other properties tied more to regulatory requirements. The “DIS” component creates the metadata indexes while CAS stores it securely, sometimes using de-duplication and/or WORM for storing the data.
All of these new applications have their own strengths and weaknesses - keeping in mind your specific requirements, you need to be diligent with your research (recommend evaluating the products you are interested in). Oh - and don’t get suckered into a really cool feature that you’ll never use! That never happens, right!
Good luck with your research! This market segment is still evolving, so it can be quite confusing.
Blog ya later!
Joe