A data warehouse is a repository of an organization's electronically stored data. Data warehouses are designed to facilitate reporting and analysis.
This definition of the data warehouse focuses on data storage. However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform, and load data into the repository, and tools to manage and retrieve metadata.
Datawarehousing arises in an organisation's need for reliable, consolidated, unique and integrated reporting and analysis of its data, at different levels of aggregation.
The practical reality of most organisations is that their data infrastructure is made up by a collection of heterogeneous systems. For example, an organisation might have one system that handles customer-relationship, a system that handles employees, systems that handles sales data or production data, yet another system for finance and budgeting data etc. In practice, these systems are often poorly or not at all integrated and simple questions like: "How much time did sales person A spend on customer C, how much did we sell to Customer C, was customer C happy with the provided service, Did Customer C pay his bills" can be very hard to answer, even though the information is available "somewhere" in the different data systems.
Another problem is that ERP systems are designed to support relevant operations. For example, a finance system might keep track of every single stamp bought; When it was ordered, when it was delivered, when it was paid and the system might offer accounting principles (like double bookkeeping) that further complicates the data model. Such information is great for the person in charge of buying "stamps" or the accountant trying to sort out an irregularity, but the CEO is definitely not interested in such detailed information, the CEO wants to know stuff like "What's the cost?", "What's the revenue?", "did our latest initiative reduce costs?".
Yet another problem might be that the organisation is, internally, in disagreement about which data is correct. For example, the sales department might have one view of its costs, while the finance department has another view of that cost. In such cases the organisation can spend unlimited time discussing who's got the correct view of the data.
It is partly the purpose of Datawarehousing to bridge such problems. It is important to note that in Datawarehousing the source data systems are considered as given: It is not the task of the datawarehousing consultant to figure out, that since the problem is that the CRM system identifies a person by initials, while the Employee-Time-Management system identifies a person by full name while the ERP system identifies a person by social security number; and since a person can change his name: things do not work and the organization should invest in and implement one or two new systems to handle CRM, ERP etc. in a more consistent manner.
Rather, the datawarehousing consultant is charged with making the data appear consistent, integrated and consolidated despite the problems in the underlying source systems. The datawarehousing consultant achieves this by employing different datawarehousing techniques, creating one or more new data repositories (i.e. the datawarehouse) whose data model(s) support the needed reporting and analysis.