Classification or Search?

Couple of days ago, there was an interesting post by Michael Schrage where he questioned need for information classification in today’s (mostly electronic) world. I often hear same opinion from people who rely primarily on MS Outlook for storage and search of their documents. Apart from the fact that it rubs the IT administrators and record managers wrong way, there is some merit in his way of thinking. People usually get what they want – the information could be easily found and is easily accessible.

But why it is like this and is it applicable to all documents? First of all, we live in a world where information governance lies somewhere on a continuum between total ‘anarchy’ – where all documents live unorganized in one place, and a ‘tyranny’ – where every document, from the moment it is created, is classified and tracked. One side of the spectrum could be considered as for free spirited, right brain people, the other one for left brainer bureaucrats or ‘Type As’ as Schrage describes them. But reality lies somewhere in between, each of us personally leans to smaller or larger degree to one or the other end of the spectrum. My personal believe is that for us personally and as it is for organizations, to be really productive and creative, we need to balance on the edge of the chaos and tyranny.  To Schrage’s point – people quite often waste their time classifying the information that does not have to be classified. But then why do we classify in the first place? There is couple of objectives. The first one is most obvious – to easily find information, and this is what Schrage is referring to.

Not long time ago, when the documents existed only in physical form – people invented classification to locate and to find information. A good example is Dewey’s Decimal Classification system used in the libraries. First you locate books based on the class and subject, once you found it, you use index to find information within it. Electronic documents moved the limits of such system further, giving new capabilities and opportunities to search.

In case of my personal account with MS Outlook or with Twitter, Schrage is right. The value of classification of my emails for purpose of search is low. Outlook is pretty good and flexible allowing me to locate needed information fairly quickly. But why is it like this? This happens primarily because MS Outlook captures all the needed metadata describing context of the email automatically, with me spending no time on this. Sender address, date sent, received, subject, and content are searchable. Additionally the email treads functionality makes things easier to dig in deeper into messages when needed. This works so well since I am intimately familiar with my emails, and can easily recollect and associate the information with its context. But this is not going to be the same case if I inherit mailbox from someone else. Although the search might help with narrowing the results, I will need more to figure out what the message is about, and if it corresponds to what I am looking for. So, as per Schrage point – this does work for my personal productivity, but it will not help in case of an organization where I have to collaborate.

So, although I agree that classification is not needed here, and as a matter of fact it could be even restrictive, the key to success is the metadata describing the content. In case of Outlook, as I already mentioned, some of it is captured automatically. In other cases, however the metadata needs to be added, to keep the context with the content. It could be manual, but this is what most of people perceive as a ‘waste’ activity. It could be automatic, and to some degree it is possible as with MS Office documents. However, there still be some metadata that only the author could decide, as it corresponds to his or her intentions. Additionally the metadata itself could have its own classification or hierarchy to be meaningful.

So search and findability are one of the objectives of the classification. Another one, and especially important in case of organizations, is the records classification. Records should be kept for periods of time prescribed in retention schedules, usually based on document type classification. So here the classification is not going to disappear.

In summary, I agree that importance of classification will be diminishing as the technology evolvs. The automatic classification will definitely be of help but it is not there yet today. As artificial intelligence tools will become more truly ‘intelligent’ and capability of the systems will increase to analyze the content of the data, the need for manual classification will be limited. But the real purpose behind the scenes will remain – the accuracy and completeness of the metadata. Tools like Google Search or SharePoint 2010 with FAST search engine are on right track to narrow the search scope and to mine the results. Ability to use enterprise keywords, with good search analytics will help with the findability. However the need for classification will not disappear, but it will become of limited importance to most of the users.

Legal, statutory and regulatory foundation for Information Management programs

Any successful information management solution implementation requires establishing of a proper IM framework. Such framework will help with forming governance, setting up priorities, definition of constraints, and will give the overall direction to any future information programs.

The foundation of such framework is based on existing legal, statutory and regulatory requirements. Establishing of such basis, especially in larger organizations is not an easy task and requires involvement of several parties.  I made an attempt to capture some of these laws, standards and regulations used in the US and in Canada. This list is far from being exhaustive; every organization – depending on type of business – will have to establish their own baseline, which will include specific industry regulations.

United States:

Law, Statute, Regulation Short Description
Sarbanes-Oxley (SOX) 404 and 409 – Corporate and Auditing Accountability and Responsibility Act SOX deals with monitoring of creation and management of financial records, as well as disclosing of information about changes in the financial conditions or operations of the organization. It affects primarily publicly traded companies including accounting and security firms, auditors and brokers.
Health Insurance Portability and Accountability Act (HIPAA). HIPAA refers to protection of individually identifiable health information. It enforces that organizations handling such personal information notify the patients about their privacy policies.Organizations affected by this policy include health plans and health care providers.
Children’s Online Privacy Protection Act (COPPA) COPPA requires that online content providers, working with audiences that include children must use reasonable procedures to ensure that child’s parent is included in the process.
Department of Defense 5015.2 (DoD 5015.2) DOD 5015.2 identifies requirements based on operational, legal and legislative needs that records management solutions vendors must fulfill. It affects software vendors of electronic document and records management systems. Several government offices in the US require compliance with this standard, but also some other, larger organizations implementing information management systems, often use this standard during selection process. For this purpose, this standard is often used outside of the US.
Securities Exchange Act (Sec Rule 171-3 and 17a-4) SEC act outlines requirements for data retention, classification, and accessibility for organizations involved in financial securities trade.
Gramm-Leach Bliley Act The act is regulating handling and sharing of personal information, and disclosing of privacy policy to consumers. It primarily affects financial services organizations.
IRS Rev. Proc. 97-22 This guideline includes directives for taxpayers on maintenance of financial books and records using software applications.
Electronic Signatures in Global and National Commerce Act (ESIGN) This act regulates use of electronic records and signatures in commercial transactions.
Fair and Accurate Credit Transactions Act (FACTA) It allows consumers to request and obtain free credit report every 12 months. It also contains provisions to reduce identity theft and secure disposal of consumer information. The financial institutions are mainly affected by this act.
Fair Credit Reporting Act (FCRA) FCRA regulates the collection, distribution, and use of consumer information, including credit information. It affects consumer credit reporting organizations.
Freedom of Information Act (FOIA) It guarantees access to the full or partial previously unreleased information and documents controlled by the US government.
Government Paperwork Elimination Act (GPEA) This act requires federal agencies, where practicable, to use electronic forms, filing and signatures to conduct official business.
Occupational Safety and Health Act (OSHA) OSHA governs occupational health and safety in the private sector and federal government.
Uniform Electronic Transactions Act (UETA) The purpose of this act is to integrate the differing State laws in matter of retention of paper records, and the validity of electronic signatures. It supports the validity of electronic contracts.

 

Canada:

Law, Statute, Regulation Short Description
Personal Information Protection and Electronic Documents Act (PIPEDA) It governs how the private companies collect, use and disclose personal information in the course of conducting business.
Secure Electronic Signature Regulations (SOR/2005-30) These regulations stipulate how digital signatures are created and verified. It is related to Canada’s Evidence Act dealing with integrity and validity of electronic documents.
Access to Information Act Regulates access to the full or partial previously unreleased information and documents controlled by the Canadian government.
Privacy Act This act stipulates rules how the federal government must deal with personal information.
Limitations Act Limitations Act defines period of time during which legal proceedings maybe initiated, and thus influencing definitions of retention periods.
Ontario Bill 198 It provides regulations of securities issued in the province of Ontario. It roughly corresponds to Sarbanes-Oxley in the US.
Microfilm and Electronic Images as Documentary Evidence Standard This standard deals with microfilming and electronic image capture. It also describes process of establishing a program helping with ensuring document integrity, reliability and authenticity.
Electronic Records as Documentary Evidence Standard This standard delivers provisions to ensure that electronic information is trustworthy, reliable and authentic.

 

It is important to remember that the process of establishing such baseline requires deep involvement of legal department, and several business subject matter experts. Since the laws and regulations change from time to time, the organization should appoint a steward responsible for maintenance of the framework, and establish a governance model describing what to do, when such laws or regulations change.

SharePoint 2010 and Department of Defense

As you might know, SharePoint 2010 does not have their records management solution certified with DoD 5015.2 standard. MOSS 2007 was certified, but with 2010 Microsoft decided not to go through the pains of getting their product tested and approved. There are multiple reasons behind this decision, but probably the most important is that certification requires substantial effort and time. Microsoft wants to focus on developing collaboration platform, leaving the more detailed compliance requirements to software partners.

But how important is this decision? In conversations with records management professionals I often hear the opinion- “who cares, DoD standard is military oriented with strict set of rules that most of organizations will never need”. They are right; probably most of organizations will never need that level of compliance. However, the point is somewhere else. The certification guarantees that the software product delivers all that the organization will ever need, and most probably delivers more – at least when it comes to the records management. The organization does not need to use all the features; however having such capabilities removes at least one of the concerns when selecting software product related to compliance.

For example – how executives in your organization would feel if they find out that SharePoint records management solution that you just implemented, does not guarantee irrecoverable destruction of records that passed their retention period? SharePoint out-of-the-box does not provide solution for expunging of records, after they are deleted. As you might know, there were several criminal cases where courts requested recovery of deleted files and specialized agencies were often successful in this task.  I am sure that some of the executives in government and large corporations would become quite nervous knowing that.

The bottom line is that SharePoint is a great solution for implementation of records management; however, the organizations need to take into account all the requirements across the organization. I mean all the requirements – not only those explicitly stated by records managers but also the implicit business needs. Some of these requirements will need to be fulfilled by adding additional, third party web parts or application services. This on the other hand, increases the total cost of ownership, so finding proper balance between requirements, planning and design is quite critical.

Lost cause in records management – convenience copies

I found some interesting facts in recent poll by AIIM “Records Management Strategies – plotting the changes”. As many as 48% of respondents said that although they were concerned of leaving convenience copies of disposed records at the end of their retention period, they did not have a solution in place to address it. It sounds like a paradox, from one side organizations spend millions to implement enterprise content management systems, and on the other hand they leave on the table the key benefits from implementation of such systems and processes. In another, related question, respondents said that their strongest business drivers for ECM, were related to compliance with legislation and industry regulations (45% and 35%), reduction of storage costs (42%), sharing of knowledge (36%) and improvement of litigation performance and reduction of associated costs (35%).  By leaving the convenience copies unattended, all the above drivers are not being addressed, often deluding organisation that they achieved their key objectives. Even if the ‘official records’ are disposed, the organizations are still not compliant with laws and regulations, the storage costs are not reduced, eDiscovery costs will be high as all information will have to be searched, and often the business decisions will be based on outdated information. The missing last step in information management strategy implementation undermines the organizational efforts. This might not be surprising as over 35% of respondents cited lack of board/C level commitment and lack of cross-departmental agreement on how to manage electronic records, as the key obstacle to implement information management strategies.

The lesson learned from this is that groups responsible for implementation of information management within organizations need to work continuously on marketing of ECM and building strong business cases based on hard, measurable benefits. Even if this is done, after the implementation, there must be ongoing effort to accurately monitor the key performance indicators and success criteria. The outputs of these measurements should reinforce the marketing messages, helping in getting required support.

Transition – Data, Information, Knowledge, Wisdom

I looked at the relationship between the concepts of Data, Information, Knowledge and Wisdom in one of my previous posts. At the time however, I was looking from slightly different perspective. In this post I focus more on the factors that influence transition of the collected raw data into totally abstract entity as wisdom.

Concept Definition Factors contributing to transition Abstraction Level
Data Simplest representation of facts such as numbers, characters, graphics, images, sound and video. Initially in ‘raw’ format, needs to be further processed to gain meaning. Associated metadata is required to add context, describing business understanding, format, date/time, importance and others Low
Information Processed collection of data, with associated metadata describing the context. There might be various metadata dimensions allowing creating new information and its meaning based on different aggregations of facts. It is Data in a context. Identification of trends, patterns, relationships and assumption. Medium
Knowledge Awareness, understanding, familiarity, recognition of situational patterns and trends, based on synthesis of collected information that could be used achieve a business purpose. It is Information in a perspective. Acquiring of skills through experience or education. It includes perception, learning, communication, association and reasoning. Medium High
Wisdom Making the best use of knowledge, acting with appropriate judgement in complex and dynamic environments, that actually achieves business purpose. Directly related to maturity but not related to how long the organization is in business. It is applied knowledge. High

 

Graphically this could be presented in form of a pyramid, with increasing maturity and abstraction level.

 

As the abstraction level increases, the concepts become much more difficult to define and describe. For example Wisdom, in contrast to Data, becomes more philosophical idea. The higher the level of abstraction, the fewer organizations could be found utilizing the concept. This is not surprising, due to direct relationship with maturity levels. However, this is the critical factor that differentiates winners from the rest. Most of organizations focus their resources on achieving immediate tactical goals. This works well in short term, but as we can usually see, such organizations survive only in friendly business environment. As soon as the market trends change, such organizations are endangered by takeovers, or breakups. Only few, are able to make such transition, although I don’t think that there are any that fully achieved the Wisdom level. Information management does not contribute directly to products or services that the organizations sell, but like a nervous system in an organism, it is critical to utilization of the available resources to their full potential. The better distribution, sharing and collaboration, the better odds of winning with innovative products, and survival.

Information Management Context – Project Manager’s View

Implementation of information management projects is quite complex in ever changing business environments. The success or failure of such initiatives is often determined by ability of the project manager to see the big picture. Quite often such projects fail because the team concentrates on technology, neglecting other aspects of the environment. Technology is obviously important, but is merely part of the whole picture. Information management projects do not exist in isolation. There are many factors that need to be taken into account during planning, but also later closely monitored during the execution. The project manager needs to be alert to any changes in the environment and be ready to adopt. Rushing ahead with a project that do not addresses business need anymore, is going to lead to disaster.

What are the key elements of the environment that need to be addressed? The answer depends on the organization itself, but usually it could be grouped into following classes:

  • Business goals, principles and trade specific practices

Direction of the business, where it is going to be in 3 to 5 years, has direct impact on definition of business needs. The information management projects need to anticipate the change that is going to occur, and make sure that delivered business systems will support these needs, and there will be flexibility to adopt these systems easily when new requirements appear. For example, when implementing taxonomy, the project manager needs to make sure that it is scalable, so the organization will not have to spend fortune to redesign the system.  Buying trade specific classification from a third party, might save time, but each organization is different so this will require customization. Ability to satisfy business needs will also impact current and future end user satisfaction.

  • Organizational structure, roles and responsibilities of key stakeholders

Since every organization is different, it is not possible to use a single cookie-cutter approach. Identification of key players early in the project, and keeping them engaged during delivery is critical to success of the project. Making sure that stakeholders understand their accountabilities and responsibilities not only while the project is active, but also in the future when the delivered systems are operational, will help with establishing proper governance and change management.

  • Technology

Technology quite often introduces constraints to the project due to existing legacy systems, or decisions that were made already to standardize on specific products. However the project manager needs to anticipate change in the future in other systems generating data or consuming information. It is important that the project works closely with enterprise architects and monitor closely any other projects that are on the roadmap already. Quite often such projects introduce unexpected surprises, heavily impacting the project success. Even upgrades to existing systems might introduce need for change.

  • Corporate structure

Corporate structure changes quite frequently. Although information management systems should be independent from such structure by building taxonomy based primarily on business processes, often the corporate divisions have some independence in selection of tools and implementation of systems. Information management projects have often enterprise-wide effect, so making sure that all the involved groups are brought to the table, is extremely important. This is going to save lot of time and money in the future, when organization will try to leverage ability to mine information and knowledge.

  • Information Management practices

Depending on maturity of the organization, there might or might not be processes and practices in place already. The project manager must be aware of them before and during implementation. Also, delivery of new system has a rippling effect on overall ability to grow in such maturity, impacting governance and change management.

 

In summary, project managers implementing information management projects need to be acutely aware of the complexities of the whole environment, not only focusing narrowly on deliverables. Ultimately success of the project is not only measured by being on time, within budget and scope, but primarily by acceptance and usage after the project is delivered.

Is SharePoint records management capability sufficient?

TAB recently published results of a survey related to adoption of SharePoint.

What is not surprising, the adoption is constantly increasing. Of 730 organizations surveyed, 64% used SharePoint to some degree, from that however only 35% for records management. 55% of those who didn’t use SharePoint for RM, were considering using it in the future.

Of all of the organizations using SharePoint, 87% used it for file sharing, and 80% for document management. Only 26% however used it for integration of the metadata. I think one of the reasons might be that most of these installations were still done on SharePoint 2007, without improvements to metadata management that were introduced in 2010 version.

The surprising part however is that 55% of respondents were using, or considering using add-ons for records management rather than native features in SharePoint (Records Center or in-place records management in SharePoint 2010). Does it mean that users do not trust or find the out- of-the- box functionality insufficient? It would be interesting to find answer to this question.

Information Management Trends

Recently, while doing some research, I found in my documents a reference to an old Gartner report on knowledge workers productivity and its relationship to search. This report was from 2002 and stated that knowledge workers spent between 30 to 40% of their time searching for information, and they were less than 50% successful in their efforts. According to Kathy Harris and Regina Casonato – employees got 50 to 70% of information from other people rather than from their search results.

This referred to both electronic and physical documents. Physical documents are usually better organized, electronic often become quickly an information dump. Since then, there were new tools adopted and ratio of electronic to paper documents increased. With wide adoption of tools like SharePoint, instant messages, wireless phone texting, Tweeter and so on, there was a dramatic increase in amount of information that is being created and transmitted. Are we better now with the information management that then? I don’t think so. Although the search capabilities increased, and we use more powerful processors, full content search is still not the answer. Are we capturing more contextual information to help with targeted search? The answer is mostly – no – after seeing multiple implementations of SharePoint. Implementation of SharePoint sites became often too easy, without proper thought put into development of information architecture and governance. Very soon such installations turn into an information junkyard.

So what was the cost of lost productivity then in 2002? Assuming that average fully loaded salary of knowledge worker was about $ 80,000 per year, 30% will come to about $ 24,000 – per worker. These costs are mind blowing, especially if we take into account the success rate of less than 50%. So this raises a question – could be used in ECM business cases to support financial benefits, without accountants rejecting them as purely soft benefits? I touched on this in my previous blog post, and Jason White suggested interesting concept of using Business Intelligence tools to identify these benefits. But how to do it before we have ECM tools in place?

This relates to today’s report from Gartner on top 10 tech trends for 2012. Here are few interesting highlights relevant to information management:

-          Average teenager sends over 4,762 text messages per month – I am sure that busy executives with their Blackberries send less than this but it still shows how quickly volume of information is increasing

-          Context aware computing, using information about end user’s or object’s environment to improve quality of interaction – metadata and information architecture come to mind immediately, and its importance will be constantly growing.

-          Internet of everything with pervasive computing linking information generating input points like cameras, sensors, microphones, image recognition and so on. This is not only about the information volume but also about the privacy.

-          Next generation analytics – improvement in processing power will shift the analytics from data centers to end user platforms, including mobile devices. It will empower the end users to do lot of analysis themselves.

So what this all shows? It seems that the problems from 10 years ago were still not resolved, and information management is still trying to catch-up with technology. The focus of information management will have to shift towards proactive development of agile taxonomies, automatic tools to capture and normalize metadata, facilitating targeted search, as well as making analytics tools simpler for end users. This hopefully will turn into increased knowledge worker’s productivity.

Information as an asset – part II

Last week I wrote about importance of treating the information as an asset. The bottom line is that due to its intangibility, its value is difficult to measure, and thus more often than not, totally neglected and ignored. However, unless the value is shown, the managers will continue sidelining information management projects, unwillingly leading the organizations further into information overload chaos.

So is there any practical advice how to approach the value information estimation?  There is no simple answer as every organization perceives and uses information in different ways. However, like with Generally Accepted Accounting Principles, certain set of rules could be worked out allowing building foundation for estimation model.

Why do we need it?

With hundreds of projects, that organizations need to allocate limited financial and human resources, information management projects, usually are low on the list. Exceptions are fancy pet projects, like for example – implementing latest trendy applications. Often such projects bring limited benefits to the organization comparing with costs and efforts invested. Usually their business cases have financial models built on shaky numbers, basing primarily on soft benefit estimations, often discounted by accountants.

The reason why we want to estimate value of information is to make sure that the scarce resources are channeled to these projects that address needs related to what is most important to the organization. The segmentation of the information value could be done from the following three perspectives:

-          Business need – when information is part of the business workflow; related to improvements in productivity; when information is taking direct part in marketing strategy or it needs to be visible to external business stakeholders. Key role here also plays ensuring single version of truth – when the organization wants to make sure that the decisions are made based on latest version of information. The business need also includes improvement of the productivity by allowing users to find the information faster, but also to spend less time managing information that does not have to be managed to the same degree (ex. transitional records). Pareto rule applies here pretty well – spend 80% of your time on managing 20% of the most important informational artifacts.

-          Risk – organization needs to comply with legal, regulatory or statutory requirements; needs to provide evidence of business decisions, activities and transactions

-          Costs – to replace the information or costs related to acquire information, licensing and subscriptions

The value of information is realized only through its use and this should be criteria for its measurement. However – the ‘use’ is totally subjective. Attempts to measure information value by representing it through more tangible dimensions like availability through business intelligence tools or data volume cannot be successful. For example, in case of BI – the way how well information is aggregated cannot determine capabilities of organization’s management. On the other hand, volume of data does not reflect its quality, content or ability to find information within it.

Valuation approaches

As mentioned above, the valuation of the information is often complex due its dependency on many, often intangible factors. However there are some situation that the straight valuation could be done. But first let’s put some groundwork. There are two types of approaches:

-          Qualitative – tending to be subjective, describing information in terms of some categorization, often informal

-          Quantitative – based on hard numbers and as such, more objective and reproducible

The information valuation fits somewhere in the continuum between purely qualitative and purely quantitative. The degree of how close it will be to either side of the spectrum will depend on type of information, how it is used, its purpose, type of business organization, risks impacts, organizational culture and so on. The organization needs to develop set of classes for its informational assets and categorize the assets accordingly. Once this is done, various strategies to manage the information and prioritization of related projects will be possible.

 

The development of such set of classes should be governed by some basic principles.

Read more »

Information as an assset

In some business areas, concept of an asset is fairly well developed. From one perspective the key motivator is usage and maintenance of the assets, from the other side, there is the financial aspect – how accountants perceive the assets and how they depreciate over time. The financial aspect is a great motivator to keep the process clean, as usually it is regulated and affects the bottom line of the company. On the other hand, the usage and maintenance of the physical assets, is less structured but easily understandable. You can see it, you can touch it, if you don’t maintain it, it will stop working.

Financial assets in certain way are different – however their management process is well developed as it is primary vehicle for increasing of the revenues, and mostly is regulated. The same motivating factor of proper accounting plays significant role.

Information as an asset – is much more difficult concept to grasp, and often neglected. One of the reasons is that the accountants don’t know how to book it, so the underlying motivating factor as described above, is simply not there. The only time when organizations dig-in their heels with regards to information – is during contract negotiations when it comes to protection of intellectual property. Otherwise the information is allowed to float with little structure, little oversight, protection and management. However, the information is the asset and as the asset it has its own intrinsic value. That’s true, it is difficult to measure it, nevertheless organizations need to define information as the asset and integrate its lifecycle with their overall operational and financial processes. It becomes even more important, when the focus of the company shifts from delivery of physical goods to services.

The same stages of the asset life-cycle are valid with the information:

  • creation
  • storage and preservation
  • management
  • use
  • disposal

Information shares some of the attributes with the physical assets – for example – being time variant. As it ages, information’s value usually decreases, and this needs to be factored-in when developing information value estimation model.

The most important aspect however is that the information value is realized only when it is used. Information that cannot be found is worthless. That is why ability to search and find information is key element of information management. Technology is less important here, development of right taxonomy, classification, controlled vocabularies with ability to tag information at the point of creation – play key role here.

Therefore to be successful, organizations need to:

  • Define information life-cycle and its value
  • Integrate the information life-cycle with overall operational and financial processes of the organization
  • Define information architecture and keep it up to date