Part 2: The method
The fun part with designing a big data system is that both ends of the tunnel have a fuzzy light: the data is not in a stable, normalized form, and the intended presentation is itself not clear, mostly because a better idea of what we want is only possible when we start having something.
Coming up with a universal data model that covers all the cases is not possible, for the simple reason that the goals are many, and they are growing and changing.
To address this challenge, I started using a simple analysis methodology:
I separate the scenarios from the use cases, and they both evolve hand in hand.
- Use Cases are the intended actions that rely on the data, independent of the fact that the data may be available.
- Scenarios are the conditions and data that can be available, regardless of their use.
According to W. H. Inmon (if I’m not misquoting; I lost the reference), the data analysis does not present the information needed for use cases. It represents what we could possibly do with it (In a next post I’ll make a good analogy where this should be clear).
For big data, we can iteratively build up the data model as more use cases are found; and iteratively build up possible uses for the data, as more data becomes available and is integrated / federated into the data model.
These two aspects are synergetic: New use cases can provide need for new data, and availability (or unavailability) of data can determine the uses for that data.
This applies to data and metadata:
Unavailability of some data can be interpreted differently (not having a report of administration of an injection in a hospital has different impact and meaning from a patient’s lack of reporting taking anti-depressive medication).
The information quality may depend on who provides it (for example some cultural factors will influence the customer’s report of use of drugs, contraceptives, etc.)/
When building the model (and very carefully separating it from the use cases), my rule is: keep everything just in case. For example, a cell phone can have an attached device that monitors the patient’s glucose levels. This is not the same as a measurement by a nurse, but we don’t want to waste this information just because a hospital currently does not want to use it.
No comments:
Post a Comment