To start let’s consider two distinctions about organizational processes. Following Sig over at Thingamy, two basic types of processes exist: easily repeatable processes (ERPs) and barely repeatable processes (BRPs).
ERPs: Processes that handle resources, from human (hiring, firing, payroll and more) to parts and products through supply chains, distribution and production.
BRPs: Typically exceptions to the ERPs, anything that involves people in non-rigid flows through education, health, support, government, consulting or the daily unplanned issues that happens in every organisation.
As I noted in Social Learning and Exception Handling, BRPs result in business exceptions and take up almost all of the time employees spend at work. Interestingly, much of the writing I see on Big Data is about making ERPs more efficient or making guesses about when to expect occurrences of a BRP. In other words, both goals are really about making coordination of organizational efforts more efficient and/or effective.
How organizations coordinate their activities is essential to the way they function. What makes sense for the organization’s internal processes may not make sense in its ecosystem, and vice versa. These are distinctions that analysts of Big Data sometimes fail to note and consider.
For example, in The Industrial Internet the Future is Healthy, Brian Courtney notes the following about the use of sensors in industrial equipment and the benefits derived from storing at big data scale.
Data science is the study of data. It brings together math, statistics, data engineering, machine learning, analytics and pattern matching to help us derive insights from data. Today, industrial data is used to help us determine the health of our assets and to understand if they are running optimally or if they are in an early stage of decay. We use analytics to predict future problems and we train machine learning algorithms to help us identify complex anomalies in large data sets that no human could interpret or understand on their own [my emphasis].
The rationale behind using data science to interpret equipment health is so we can avoid unplanned downtime. Reducing down time increases uptime, and increased uptime leads to increases in production, power, flight and transportation. It ensures higher return on assets, allowing companies to derive more value from investment, lowering total cost of ownership and maximizing longevity.
In other words, Courtney’s analysis of the big data generated from sensors that constantly measure key indicators about a piece of equipment assumes the data ensures a decrease in downtime and an increase in uptime resulting in increases in production, power, flight and transportation. Yet, the implied causal relationship doesn’t translate to all cases, especially those involving barely repeatable processes (BRPs) that produce business exceptions. It is in BRPs that the real usefulness of big data manifests itself, but not on its own. As Dana Boyd and Kate Crawford note in Critical Questions for Big Data, “Managing context in light of Big Data will be an ongoing challenge.”
Big Data Needs Thick Data
Consider a case described in another post here, Business Exceptions Are Not Always What They Seem. As I noted in that post:
In Quality vs Quantity: Why Faster Is Not Always Better Barry Lynch of GE explains a case in which GE’s Proficy Workflow solution prevented the use of informal work practices on the factory floor.
One company implemented the Workflow solution with two main goals: reduce the number of “off quality” batches per month and increase the speed of batch production. On one of my visits, the automation manager and I were talking about the system and how they had not had a single batch rejected due to production issues since it went live. Based on the costs of a lost batch, this was huge for them, but they had not seen the increased speed they were expecting in the primary batching area.
To investigate, we both went down to the line and met with one of the operators using the system. We asked her honest opinion of the system, and she was very complimentary on how it helped her manage daily tasks. Then we asked, “But why have you not been able to speed up the process?” Her response was straight to the point: “With this system in place no one takes the shortcuts we used to. The system makes sure we all make the same batches the same way consistently.”
So, we concluded that the quality issues seemed to be caused by people trying to speed up the process as opposed to following their own documented processes. They made a few more batches, but those gains were offset by expensive rejects that occurred.
In other words, the practical knowledge gained from informal learning on the factory floor allowed employees to take shortcuts to speed up batch production, even though the unintended result was to increase off-quality batches. A classic speed vs. quality tradeoff. It would, of course, be useful to know why operators in this situation felt the need to speed up production prior to the GE system implementation. A variety of scenarios could explain it.
As in many instances of Big Data implementation, without insights from associated Thick Data the decision-makers’ assumptions define what to expect, i.e. that uptime leads to increases in production, power, flight and transportation. Thick Data, as Tricia Wang characterizes the use of ethnography to supplement analytics, provides the context for Big Data.
Big Data reveals insights with a particular range of data points, while Thick Data reveals the social context of and connections between data points. Big Data delivers numbers; thick data delivers stories.
Without the stories and tacit knowledge in Thick Data, the guiding assumptions about Big Data implementations proceed without regard to the context of their use. What Barry Lynch of GE uncovered, that taking discretion out of the process of batch production slowed the process while increasing quality, describes an organizational practice in which a barely repeatable process (BRP) involving local, informal knowledge occurred and informed efforts to speed up batch processing.
It seems plausible that speeding up batches meant taking shortcuts which, one must assume, also implies avoiding doing certain actions associated with running a batch. In other words, the context of the coordination decisions constituted the Thick Data used to manage the coordination process.
Thomas Erickson of IBM puts it somewhat differently but his point is in line when he notes:
People are everywhere. They are experts on local information– their homes, their places of work, the routes they travel every day. That’s what they care about, and that’s what will get them involved. And when we design systems that make a place for people to be active, first class participants, where millions can share what they know, then we’ll really see smarter.