I really enjoyed Leena Rao's recent article on TechCrunch, "Why We Need to Kill "Big Data"", not least because it reflected pretty much how I've been feeling for a long time about the lazy, misleading use of jargon and never-ending hype cycle as the tech vendors jump upon the latest bandwagon. (Pick one, any one, because they're all at it. Yes, I'm looking at you, IBM, Oracle, Microsoft, Teradata, SAP...)
Leena's perspective also provides a healthly counterpoint to the views currently being positioned by IDC (as reported in the Data Informed article "Now is the time to buy into Big Data"), which seems to offer a very tech-centric point of view and plays right into the hype cycle without actually saying anything.
We've been here before - Decision Support, OLAP, Management Information Systems, Performance Measurement, Business Intelligence, Master Data Management - all are catch-all term that actually had little if any real meaning in and of themselves. "Big Data" is just the latest entry in the Buzzword Bingo Lexicon.
In the twenty-odd years I've been involved in business solutions delivery and management consulting, it seems that *insert tool of choice* is promoted as the "next big thing to solve all your problems", without giving any thought whatsoever to what the actual problem, scenario or challenge actually might be. The technology is almost always the totally wrong entry point for the conversation, because technology only becomes of relevance when it is applied to solving a particular problem - and depending on the problem at hand, some technologies are more equal than others.
In the Information Management space, almost all problems relate to the capture, exchange, dissemination, sharing, interpretation and acting upon one or more data sets. (Data Governance then sets out to ensure that these tasks can happen in a repeatable, consistent, efficient and effective manner). Which got me thinking. Are there generic categories of "Information Use Case" that we can use to describe various business problem scenarios, so that we can then start to make more informed choices about what sorts of technologies might be appropriate?
Or to put it another way: "what's your point, caller?!"
Anyway, here are some information-related problem situations that I could identify (you might think of others, or take issue with some of mine. I'd be delighted to hear from you if you do, because it means you've been thinking about the business problem, and not about the technology!):
Note that there is likely to be interaction (or even significant overlap) between these classes of use case. Each individual class of Use Case should be considered a necessary, but not sufficient, element of the organisation's Data Governance and Information Management capability.
Data is of little value if all it does is sit in the data warehouse. As a result, the presentation layer is of very high importance.
Most On-Line Analytic Processing (OLAP) vendors have a front-end presentation layer that allows users to call up pre-defined reports or create ad hoc reports. The aim is to synthesise large quantities of raw data into meaningful views that can be acted upon in context.
As such, reporting against structured data can be viewed as a specific type of authoring process; any reporting output is likely to be produced and submitted to the more general publishing process.
A number of key considerations need to be taken into account as part of the reporting capability:
- Number of reports: The higher the number of reports, the more likely it is that purchasing a pre-built vendor solution is the right approach. Reporting tools typically make creating new reports easier (by offering re-usable components) and also provide report management systems to make maintenance and support functions easier.
- Desired Report Distribution Mode(s): reports will only be distributed in a single mode (for example, email only, or over the browser only), or will users access the reports through a variety of different channels?
- Ad Hoc Report Creation: in most environments, it is expected that end- users will be able to create their own ad hoc reports. Ad hoc report creation necessarily relies on a strong metadata layer and shared understanding of what the information presented in the report is communicating.
- Data source connection capabilities: in most modern environments, users will need to access data sources using both relational database and OLAP multidimensional data technologies.
- Scheduling and distribution capabilities: in a realistic data usage scenario, senior executives will only have time to come in on Monday morning and look at the most important information from the previous week. To meet this need, the reporting tool must have scheduling and distribution capabilities. Weekly reports are scheduled to run on Monday morning, and the resulting reports are distributed to the senior executives either by email or web publishing.
- Security Features: reporting tools are geared towards a number of users in different Business Units and teams, with different priorities and responsibilities. Therefore, ensuring that people see only what they are supposed to see is important. Most reporting tools have capabilities to manage security at different levels, including at the report level, folder level, column level, row level, or even individual cell level. Furthermore, they have a security layer that can interact with the common corporate login protocols and "single sign-on" policies.
- Export capabilities: data export is commonly required for Excel, flat file, and PDF formats. It may also be desirable and time-saving to export the reporting format as well as the data itself.
- Integration with the Microsoft Office environment: It is likely that reporting information will need to be incorporated into documents created with Microsoft Office products, especially Excel, both for manipulating data and for publishing. Some reporting tools now offer a Microsoft Office-like editing environment for users, so all formatting can be done within the reporting tool itself, with no need to export the report into Excel.
Strategic Intelligence and Data Mining
Data mining is the process of discovering new patterns and inferences in large data sets, involving a range of methods and techniques such as artificial intelligence, machine learning, statistics and database systems
The goal of data mining is to extract knowledge from a data set in a human-understandable structure and may involve a complex process of database and data management, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of found structure, visualisation and online updating.
It is likely that a risk-based approach will need to incorporate information processing and data analytic features including:
- Anomaly detection: Identification of outliers, changes and deviations in the data records that might be interesting or data errors and require further investigation.
- Association rule learning: searching for relationships and dependencies between variables.
- Clustering: discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data.
- Classification: identifying and applying a known, generalised structure or categorisation to new data. (For example, an email program might attempt to classify an email as legitimate or spam.)
- Regression: discovery of an approximation function that models the data with the least error.
- Summarisation: to provide a more compact representation of a data set, including visualisation and report generation.
At a minimum, content has to be written and it has to be posted. Between those two steps, it is usually checked for its writing quality and correctness. Legal and compliance may need to review it. Ideally, any publishable material will be reviewed by a high-level editor or editorial board to make sure that it is consistent in style and fact with other information already in the published domain.
As with the searching process, the authoring process will need to be context-aware to ensure that information is defined and used appropriately, within both the context within which it was created, and the context of any intended (or unintended) usage.
The capacity to provide timely, compelling and concise advice to inform senior decision makers and executives is a vital capability for any organisation.
The Executive Briefing process therefore requires departments and business units to be able to locate, collate, and interpret the available information, such that the context and rationale for any decision can be supported and substantiated.
In simple terms, education provides a knowledge base that underpins any other activities the individual may engage in at a later stage. Training is not as general and tends to concentrate on skills development for the purposes of a specific skill or task. Learning tends to be associated with the self-developed of the individual.
Capability for education, training and learning of staff is a key aspect of service improvement in the University. In support of this, organisations need to provide an information sharing capability that enables all staff to access the process, policy and knowledge resources pertinent to their role.
Many information users will be interested in finding material that has been authored by someone else in the organisation.
Assuming this content has been made available for others to access, the capability for finding, retrieving and accessing the required material may be many and varied, depending on a number of factors including: the nature of the content medium; the physical locations of the originator and consumer; the mechanisms available; other content that the consumer may wish to combine.
The nature of information content will also be dependent upon both the context within which it was created, and the context of the intended usage. Any search and retrieval process will need to be context-aware to ensure that information is used appropriately.
A technology-enabled approach to content search and retrieval will become increasingly important. However, it is also important to give due consideration to the governance authorities and control processes that define what content is to be made available.
Is should also be noted that any content search capability does not stand alone and needs to be fully integrated with content authoring and publication processes and systems. As such, the search process is likely to be implemented as part of integrating Records Management, Document Management and Knowledge Management solutions.
Records Management is the practice of maintaining the records of an organisation from the time they are created up to their eventual disposal. This may include classifying, storing, securing, and destruction (or in some cases, archival preservation) of records. A record can be either a tangible object or digital information, such as office documents, databases, application data, and e-mail.
The ISO 15489-1: 2001 Standard defines Records Management as "[the] field of management responsible for the efficient and systematic control of the creation, receipt, maintenance, use and disposition of records, including the processes for capturing and maintaining evidence of and information about business activities and transactions in the form of records". The standard defines “records” as "information created, received, and maintained as evidence and information by an organisation or person, in pursuance of legal obligations or in the transaction of business"
Records Management is primarily concerned with the evidence of an organisation's activities, and is usually applied according to the value of the records rather than their physical format. While there are many purposes of and benefits to records management, as both these definitions highlight, a key feature of records is their ability to serve as evidence of an event. Proper records management can help preserve this feature of records.
Many jurisdictions now make legislative provision based on the principle that government information is a public resource to be managed in the public interest. Such instruments give citizens the right to make requests to access Government documents. Similarly, where personal information is retained by an Agency, the individual has the right to request access to those records.