External data and Data Warehouse Enterprises are increasingly information centric, and recent trends reveal that most competitive businesses require external data in their enterprise data warehouse to strategically position themselves in the market. This article touches upon a few critical challenges specific to integrating data from external systems as well as best practices and considerations to do so successfully. To build a powerful data warehouse you must include as much relevant data from internal and external sources as possible to optimize the decision processes that managers As an example, retailers have current/historic sales data along with pricing information, but this will only provide partial insight into the determinants that are driving sales. Information such as weather, income tax distribution periods, regional or local population growth, household demography, may also play a key factor in driving sales and must be taken into consideration. The government and many other organizations capture and deliver this data and distribute it free or for a nominal fee. Here are some examples: Weather: Yahoo offers an RSS fed that can be called using an http request as follows: http://weather.yahooapis.com/forecastrss?p=48161 The parameters to the request are the following: Parameter, Description, Examples p, US zip code or Location ID, p=95089 or p=USCA1116 u, Units for temperature (case sensitive), f: Fahrenheit or c: Celsius The RSS response from this request includes the following information: Geographic latitude/longitude Weather Conditions (48 distinct codes) Temperature (F,C) Forecast (Condition, High Temperature, Low Temperature) Using Demographic Data along with internally generated data can go a long way to enhance the data warehouse. The following are examples of where this data can be obtained: http://www.geolytics.com/? gclid=CMeliJqrqJ0CFU1M5QodekqHkA With limited data (address or lat/long information) you can get 60 demographic attributes for that address that include factors such as income, average number of people per home, average age, education Likewise another good site for demographic data and data validation is: http://www.melissadata.com/dqt/index.htm This site offers validation against address, phone numbers, email and perform name parsing via Web Services calls which can help accelerate the ETL development process, provided you do not have to develop the code and maintain large demographic databases onsite. Additionally, this site offers demographic data on income, media locations, reverse phone and mailing lists. Finally, the Federal government maintains thousands of databases with data gathered from various agencies that contain information that can be coupled with internal data to make your data warehouse far more powerful. For example: http://research.stlouisfed.org/fred2/ http://www.data.gov/catalog http://www.census.gov/ These sites contain historic economic and demographic data the government has collected regarding income, population, interest rates, commodity prices, housing sales and the downloads are free. The goal of data warehouse development should be to provide the tools and data for optimal decision making. To assure this goal is achieved, make sure external source are also included in the initial and ongoing data warehouse implementation. External/Unstructured Data in the Data Warehouse Several issues relate to the use and storage of external and unstructured data in the data warehouse. 1. The frequency of availability 2. It is totally undiscipline 3. Its unpredictability Many methods to capture and store unstructure information such as: 1. Near-line Storage 2. Create two stores of unstructured data Meta Data and External Data Meta data is vital because through it external data is registered, accessed, and controlled in the data warehouse environment. The importance of meta data is best understood by noting what it typically encompasses: Document ID Date of entry into the warehouse Description of the document Source of the document Date of source of the document Classification of the document Index words Purge date Physical location reference Length of the document Related references Storing External/ Unstructured Data External data and unstructured data can actually be stored in the data warehouse if it is convenient and cost-effective to do so. To store external data and unstructured data requires considerable resources By associating external data and the unstructured data with a data warehouse, the external data and the unstructured data become available for all parts of the organization, such as finance, marketing, accounting, sales, engineering and so forth Modeling and External/Unstructured data What is the role of the data model and external data. See below (figure 8.6) Archiving External data Every piece of information external or otherwise has a useful lifetime. Once past that lifetime, it is not economical to keep the information. An essential part of managing external data is deciding what the useful lifetime of the data is. Comparing Internal data to external data One of the most useful things to do with external data is to compare it to internal data over a period of time. The comparison allows management a unique perspective. For instance, being able to contrast immediate and personal activities against global activities and trends allow an executive to have insights that simply not possible elsewhere.