Forecast: Cloudy with Eventual Challenges
Understanding data sources for acquiring and integrating data is rarely a slam dunk. After all, each data source is like an employee in a company, unique coming to the table with similar content and behavior but there are underlying differences, sometimes very subtle, indicating the person is from somewhere else. My favorite example, the way the word coffee is annunciated across the US puts a little more insight on where a person may be from, says the Jersey girl. Unlike the accents where you understand the word and carry on with meaningful communication, data sourcing nuances define the phrase, the devil is in the details. Delta’s vs. full pulls, does the file need interception for cleanup, SLAs/ OLAs, upstream sources, downstream consumers…oh my. Not to mention the sensitive task of data classification requirements, PII (Personally Identifiable Information) vs. (Non PII) that data owners and integrators need to collaborate closely on.
- Low Risk: Gender, Age, Zip Code
- Medium: Name, Address, IP address
- High: Phone number, SSN, Financial information, aggregated data: Email, Name, Address
The above categories relate to the impact to the company not securing this information. Hence, one piece of information may not be High Risk but aggregated with another piece of information it will be.