Top 3 Products & Services


Dated: Aug. 13, 2004

Related Categories

By Najmi

A data warehouse is a structured extensible environment designed for the analysis of non-volatile data, logically and physically transformed from multiple source applications to align with business structure, updated and maintained for a long time period, expressed in simple business terms, and summarized for quick analysis.

A data warehouse is managed data situated after and outside the operational systems Data warehousing has quickly evolved into a unique and popular business application class. Early builders of data warehouses already consider their systems to be key components of their IT strategy and architecture. Numerous examples can be cited of highly successful data warehouses developed and deployed for businesses of all sizes and all types. Hardware and software vendors have quickly developed products and services that specifically target the data warehousing market.

The primary concept of data warehousing is that the data stored for business analysis can most effectively be accessed by separating it from the data in the operational systems. Many of the reasons for this separation have evolved over the years. In the past, legacy systems archived data onto tapes as it became inactive and many analysis reports ran from these tapes or mirror data sources to minimize the performance impact on the operational systems.

These reasons to separate the operational data from analysis data have not significantly changed with the evolution of the data warehousing systems, except that now they are considered more formally during the data warehouse building process. Advances in technology and changes in the nature of business have made many of the business analysis processes much more complex and sophisticated. In addition to producing standard reports data warehousing systems support very sophisticated online analysis including multi-dimensional analysis.

Data Warehousing vs. Business Intelligence

The Differentials Are:

Data Warehousing is a way of storing data and creating information through leveraging data marts. DM's are segments or categories of information and/or data that are grouped together to provide 'information' into that segment or category. DW does not require BI to work. Reporting tools can generate reports from the DW.

Business Intelligence is the leveraging of DW to help make business decisions and recommendations. Information and data rules engines are leveraged here to help make these decisions along with statistical analysis tools and data mining tools.

You will find that BI is much like ERP in that it can be extremely expensive and invasive to your firm and there is a wide range between the offerings - low end to high end - that facilitates the pricing. There is a long list of tools to select from. There are also services that provide this as an outsource. Some of these services will allow you to eventually 'own' the solution and in-source it at a future date. Like anything else, this comes at a price. This scenario works well for those who do not have a high caliber IT staff and would like to get results with a short ramp up time, basically because the system is already built. Your rules and reports just have to be generated. That is a bit oversimplified, but you get the picture.

Work Data Warehousing

Integrating Data From More Than One Operational System

Data warehousing systems are most successful when data can be combined from more than one operational system. When the data needs to be brought together from more than one source application, it is natural that this integration be done at a place independent of the source applications. Before the evolution of structured data warehouses, analysts in many instances would combine data extracted from more than one operational system into a single spreadsheet or a database. The data warehouse may very effectively combine data from multiple source applications such as sales, marketing, finance, and production. Many large data warehouse architectures allow for the source applications to be integrated into the data warehouse incrementally.

The primary reason for combining data from multiple source applications is the ability to cross-reference data from these applications. Nearly all data in a typical data warehouse is built around the time dimension. Time is the primary filtering criterion for a very large percentage of all activity against the data warehouse. An analyst may generate queries for a given week, month, quarter, or a year. Another popular query in many data warehousing applications is the review of year-on-year activity. For example, one may compare sales for the first quarter of this year with the sales for first quarter of the prior years. The time dimension in the data warehouse also serves as a fundamental cross-referencing attribute. For example, an analyst may attempt to access the impact of a new marketing campaign run during selected months by reviewing the sales during the same periods. The ability to establish and understand the correlation between activities of different organizational groups within a company is often cited as the single biggest advanced feature of the data warehousing systems.

The data warehouse system can serve not only as an effective platform to merge data from multiple current applications; it can also integrate multiple versions of the same application. For example, an organization may have migrated to a new standard business application that replaces an old mainframe-based, custom-developed legacy application. The data warehouse system can serve as a very powerful and much needed platform to combine the data from the old and the new applications. Designed properly, the data warehouse can allow for year-on-year analysis even though the base operational application has changed.

Data Is Mostly Non-Volatile

Another key attribute of the data in a data warehouse system is that the data is brought to the warehouse after it has become mostly non-volatile. This means that after the data is in the data warehouse, there are no modifications to be made to this information. For example, the order status does not change, the inventory snapshot does not change, and the marketing promotion details do not change. This attribute of the data warehouse has many very important implications for the kind of data that is brought to the data warehouse and the timing of the data transfer.

Let us further review what it means for the data to be non-volatile. In an operational system the data entities go through many attribute changes. For example, an order may go through many statuses before it is completed. Or, a product moving through the assembly line has many processes applied to it. Generally speaking, the data from an operational system is triggered to go to the data warehouse when most of the activity on these business entity data has been completed. This may mean completion of an order or final assembly of an accepted product. Once an order is completed and shipped, it is unlikely to go back to backorder status. Or, once a product is built and accepted, it is unlikely to go back to the first assembly station. Another important example can be the constantly changing data that is transferred to the data warehouse one snapshot at a time. The inventory module in an operational system may change with nearly every transaction; it is impossible to carry all of these changes to the data warehouse. You may determine that a snapshot of inventory carried once every week to the data warehouse is adequate for all analysis. Such snapshot data naturally is non-volatile.

It is important to realize that once data is brought to the data warehouse, it should be modified only on rare occasions. It is very difficult, if not impossible, to maintain dynamic data in the data warehouse. Many data warehousing projects have failed miserably when they attempted to synchronize volatile data between the operational and data warehousing systems.

Traditional Approaches To Historical Data

Throughout the history of systems development, the primary emphasis had been given to the operational systems and the data they process. It is not practical to keep data in the operational systems indefinitely; and only as an afterthought was a structure designed for archiving the data that the operational system has processed. The fundamental requirements of the operational and analysis systems are different: the operational systems need performance, whereas the analysis systems need flexibility and broad scope. It has rarely been acceptable to have business analysis interfere with and degrade performance of the operational systems.

Data From Legacy Systems

In the 1970's virtually all business system development was done on the IBM mainframe computers using tools such as Cobol, CICS, IMS, DB2, etc. The 1980's brought in the new mini-computer platforms such as AS/400 and VAX/VMS. The late eighties and early nineties made UNIX a popular server platform with the introduction of client/server architecture.

Despite all the changes in the platforms, architectures, tools, and technologies, a remarkably large number of business applications continue to run in the mainframe environment of the 1970's. By some estimates, more than 70 percent of business data for large corporations still resides in the mainframe environment. There are many reasons for this. The most important reason, and one that is particularly relevant to our topic, is that over the years these systems have grown to capture the business knowledge and rules that are incredibly difficult to carry to a new platform or application.

These systems, generically called legacy systems, continue to be the largest source of data for analysis systems. The data that is stored in DB2, IMS, VSAM, etc. for the transaction systems ends up in large tape libraries in remote data centers. An institution will generate countless reports and extracts over the years; each designed to extract requisite information out of the legacy systems. In most instances, IS/IT groups assume responsibility for designing and developing programs for these reports and extracts. The time required to generate and deploy these programs frequently turns out to be longer than the end users think they can afford.

Extracted Information On The Desktop

During the past decade, the sharply increasing popularity of the personal computer on business desktops has introduced many new options and compelling opportunities for business analysis. The gap between the programmer and end user has started to close as Business Analysts now have at their fingertips many of the tools required to gain proficiency in the use of spreadsheets for analysis and graphic representation. Advanced users will frequently use desktop database programs that allow them to store and work with the information extracted from the legacy sources. Many desktop reporting and analysis tools are increasingly targeted towards end users and have gained considerable popularity on the desktop.

The downside of this model for business analysis is that it leaves the data fragmented and oriented towards very specific needs. Each individual user has obtained only the information that he or she requires. Not being standardized, the extracts are unable to address the requirements of multiple users and uses. The time and cost involved in addressing the requirements of only one user prove prohibitive. This approach to data management assumes the end user has the time to expend on managing the data in the spreadsheets, files, and databases. While many of these users may be proficient at data management, most undertake these tasks as a necessity. And given the choice, most users would find it more efficient to focus on the actual analysis and the tools available to them.

Decision-Support And Executive Information Systems

Another category of popular analysis systems has been decision support systems and executive information systems. Decision support systems tend to focus more on detail and are targeted towards lower to mid-level managers. Executive information systems have generally provided a higher level of consolidation and a multi-dimensional view of the data, as high-level executives need more the ability to slice and dice the same data than to drill down to review the data detail.

These two similar and overlapping categories are perhaps the closest precursors to the data warehousing systems. Yet the high price of their development and the coordination required for their production made them an elite product that never entered the mainstream. The following are some characteristics generally associated with decision support or executive information systems:

These systems have data in descriptive standard business terms, rather than in cryptic computer field's names. Non-technical users design data names and data structures in these systems for use. The data is generally preprocessed with the application of standard business rules such as how to allocate revenue to products, business units, and markets.

Consolidated views of the data such as product, customer, and market are available. Although these systems will at times have the ability to drill down to the detail data, rarely are they able to access all the detail data at the same time.

The most important factor in the evolution of data warehousing has been the sharply increasing power of computer hardware. Along with the increase in this power, their prices have fallen just as sharply. Gordon Moore, co-founder of Intel, predicted that the capacities of a microprocessor will double every 18 months. This has not only held true for the processor but also for other components of the computer. While desktop computers today are more powerful than the mainframes of yesterday, an inexpensive server possesses power that was difficult to imagine just a decade ago.

The Pentium 4 and Alpha processors have brought incredible power to the commodity computer market. Sophisticated processor hardware architectures such as symmetric multi-processing have come to the mainstream computing with inexpensive machines. Higher capacity memory chips, a key component influencing the performance of a data warehouse system, are now available at very low prices. Now it is possible to have a moderately priced machine with 1 or 2 gigabytes of memory. Computer Bus such as PCI and controller interfaces such as Ultra SCSI have made I/O incredibly fast. Last but not the least, the disk drive has shrunk to hold amazing amounts of information. Just two decades ago, it would have taken a roomful of disk drives to store information that can now be easily stored on a single one-inch high disk drive.

Desktop Power Increasing

Entering the market as a novelty computer in early eighties, the personal computer has become the hotbed in innovation during the past decade. The personal computer was initially used for word processing and other minor tasks with no links to primary analytical functions. With the help of innovations such as powerful personal productivity software, graphical interface, and responsive business applications, the personal computer has become the focal point of all computing today. The powerful desktop hardware and software has allowed for development of the client/server or multi-tier computing architecture. Almost all data warehouses are accessed by personal computer based tools. These tools vary from very simple query capabilities available with most productivity packages to incredibly powerful graphical multi-dimensional analysis tools. Without the wide array of choices available for a data warehouse access, data warehousing would not have evolved so quickly.

Ever Increasing Power Of Server Software

Server operating systems such as Windows NT and Unix has brought mission-critical stability and powerful features to the distributed computing environment. The operating system software has become very feature-rich and powerful, as the cost has been going down steadily. With this combination, sophisticated operating system concepts such virtual memory, multi-tasking, and symmetric multi-processing are now available on inexpensive operating platforms. Operating systems such as Windows NT have made these powerful systems very easy to set up and operate reducing the total cost of ownership of these powerful servers.

Explosion Of Intranets And Web Based Applications

The most important development in computing since the advent of the personal computer is the explosion of Internet and Web based applications. Somewhat after the fact, the business community has quickly jumped onto the Internet bandwagon.

One of the most exciting fields in computing industry today is the development of Intranet applications. Intranets are private business networks that are based on the Internet standards, although they are designed to be used internally. The Internet/Intranet trend has very important implications for data warehousing applications. First, data warehouses can be available world wide on public/private network at much lower cost. This availability minimizes the need to replicate data across diverse geographical locations. Second, this standard has allowed the web server to provide a middle tier where all the heavy-duty analysis takes place before it is presented to the web-browsing client to use.

Change In The Nature Of The Business

Another very significant influence on evolution of data warehousing science is the fundamental changes in the business organization and structure during late eighties and early nineties. The emergence of a vibrant global economy has profoundly changed the information demands made by corporations in the United States and worldwide. Corporations have found markets for their products globally while competing with other companies in vastly different cultures and economic environments. The mergers and acquisition of businesses have crossed the country boundaries.

Emergence Of Standard Business Applications

Another factor that is fast becoming an important variable in data warehousing equations is the emergence of vendors with popular business application suites. Led by wildly popular German software vendor SAP AG, flexible business software suites adapted to the particulars of a business have become a very popular way to move to a sophisticated multi-tier architecture. Other vendors such as Baan, PeopleSoft, and Oracle have likewise come out with suites of software that provide different strengths but have comparable functionality.

The emergence of these application suites has a direct bearing on the increased use of data warehousing in that they are increasingly able to provide standard applications that are replacing existing custom developed legacy applications. In the near future, almost every data warehouse is likely to derive data from one of these application sources rather than the customized extraction from legacy systems. Further, there are significant initiatives at these vendors to make transaction data easily available to data warehousing systems. To the extent that these standard applications have extensive customization features, data acquisition from these applications can be much simpler than from the mainframe systems.

Business Use Of A Data Warehouse

No discussion of the data warehousing systems is complete without review of the type of activity supported by a data warehouse. Some of the activity against data warehouses is predefined and not much different from traditional analysis activity. Other processes such as multi-dimensional analysis and information visualization were not available with traditional analysis tools and methods.

There is a very interesting phenomenon that is observed with many data warehousing projects. The users of a new data warehouse only wish to get the information that they were able to get using the old tools and methods. They wish to replicate their queries and reports with the data warehouse and make sure that all the numbers match. Often there is as much apprehension of the new tools and the data warehouse as there is excitement. It is only after using the new data warehouse for a period of time that they start to explore and discover the new capabilities that are available to them. Soon after, they start to have significant input into the data warehouse enhancement process and they happily become the mentors for the new users.

Tools To Be Used Against The Data Warehouse

One of the objectives of the data warehouse is to make it as flexible and as open as possible. It is not desirable to set a steep entry price in terms of software and training for using the data warehouse. The data warehouse should be accessible by as many end-user tools and platforms as possible. Yet, it is not possible to make every feature of the data warehouse available from every end user tool.

Low-end tools such as simple query capability built into most spreadsheets may be adequate for a user that only needs to quickly reference the data warehouse. Other users may require the use of the most powerful multi-dimensional analysis tools. The data warehouse administrators need to identify the tools that are supported for access to the data warehouse and the capabilities that are available using these different tools. There can be a progression path to the higher-level tools for the data warehouse users. A user can start with a low-level tool that is already familiar to him or her. After becoming familiar with the data warehouse he or she may be able to justify the cost and effort involved with using a more complex tool.

In most data warehousing projects, there is a need to select a preferred data warehouse access tool for the most active users. A small number of users generate most of the analysis activity against the data warehouse. The data warehouse performance can be tuned to the requirements of the tool appropriate for these active users. This tool can be used for training and demonstration of the data warehouse.

Standard Reports And Queries

Many users of the data warehouse need to access a set of standard reports and queries. It is desirable to periodically automatically produce a set of standard reports that are required by many different users. When these users need a particular report, they can just view the report that has already been run by the data warehouse system rather than running it themselves. This facility can be particularly useful for reports that take a long time to run.

Such a facility would require report server software. It is likely that these reports can be accessed only using the client program for that system. This facility would need to work with or be part of the preferred data warehouse access tool previously mentioned. Many end user query and analysis tools now include server software that can be run with the data warehouse to serve reports and query results. These tools are now providing a web interface to the reports. In many data warehouse systems, this report and query server becomes an essential facility. The data warehouse users and administrators constantly need to consider any reports that are candidates to become standard reports for the data warehouse. Frequently, individual users may develop reports that can be used by other users.

In addition to standard reports and queries, sometimes it is useful to share some of the advanced work done by other users. A user may produce advanced analysis that can be parameterized or otherwise adapted by other users in different parts of the same organization or even in organizations.

Queries Against Summary Tables

For example, in a typical data warehouse, the product summary view may account for a very large number of queries where different users select different products and the time periods for product sales and profit margin queries. These queries provide quick response and they are very simple to build. Advanced users typically attach a pivot table in their analysis tool to data warehouse summary tables for simple multi-dimensional analysis.

Data Mining In The Detail Data

Even though data mining in the detail data may account for a very small percentage of the data warehouse activity, the most useful data analysis might be done on the detail data. The reports and queries off the summary tables are adequate to answer many "what" questions in the business. The drill down into the detail data provides answers to "why" and "how" questions.

Data mining is an evolving science. A data-mining user starts with summary data and drills down into the detail data looking for arguments to prove or disprove a hypothesis. The tools for data mining are evolving rapidly to satisfy the need to understand the behavior of business units such as customers and products.

Interface With Other Data Warehouses

The data warehouse system is likely to be interfaced with other applications that use it as the source of operational system data. A data warehouse may feed data to other data warehouses or smaller data warehouses called data marts.

The operational system interfaces with the data warehouse often become increasingly stable and powerful. As the data warehouse becomes a reliable source of data that has been consistently moved from the operational systems, many downstream applications find that a single interface with the data warehouse is much easier and more functional than multiple interfaces with the operational applications. The data warehouse can be a better single and consistent source for many kinds of data than the operational systems. It is however, important to remember that the much of the operational state information is not carried over to the data warehouse. Thus, data warehouse cannot be source of all operation system interfaces.

Now that you've gotten free know-how on this topic, try to grow your skills even faster with online video training. Then finally, put these skills to the test and make a name for yourself by offering these skills to others by becoming a freelancer. There are literally 2000+ new projects that are posted every single freakin' day, no lie!

Previous Article

Next Article

Ali Baba's Comment
Just one word "Brilliant". Keep it coming Mr. Admin.
14 Wed Sep 2011
Admin's Reply:

 we will and thank you very much ali 

Dollie's Comment
I am totally wowed and peraperd to take the next step now.
08 Thu Sep 2011
Admin's Reply:

 Good to know that Dollie :) best of luck dear.

Laquisha's Comment
Check that off the list of thgnis I was confused about.
07 Wed Sep 2011
Admin's Reply:

Laquisha hopefully this article help you out from your problem.

PS : We need your support and suggestion please read this article "TechiWarehouse is in trouble

Kaylee's Comment
Always the best ctoennt from these prodigious writers.
07 Wed Sep 2011
Admin's Reply:

 Thank you Kaylee

PS : We need your support and suggestion please read this article "We need your help

rajesh's Comment
respected sir pl's give me answer -relince give me 5 ip and my offece pc 30 ok so how to use 5 ip to 30 pc internet use so give me solution
02 Fri Jul 2010
Admin's Reply:

Why not just use a simply router to do the trick?