WWW.THESIS.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Thesis, documentation, books
 
<< HOME
CONTACTS



Pages:   || 2 |

«Designing Dimensional Models Process Overview The design process iterates through three phrases: 1) Start with a few preparation steps, after which ...»

-- [ Page 1 ] --

Unit 06 Notes

IST722 Data Warehouse Sy racuse University

Paul Morarescu School of Information Studies

Designing Dimensional Models

Process Overview

The design process iterates through three phrases:

1) Start with a few preparation steps, after which an initial diagram is derived from the bus matrix.

The diagram clarifies the grain of the fact table, as well as the major dimensions, attributes and fact metrics.

2) This sketchy model is iteratively developed into the full- fledged and robust model, table by table, with increasing levels of detail.

3) The final model is reviewed and validated with business and IT representatives to make sure that it satisfies their requirements.

A dimensional model for a single business process can take four weeks to build. The recommended schedule includes two design sessions a day, one in the morning and the other in the afternoon, with 2-3 hours per session, 3-4 days a week. Figure 7-1 on pp 289 shows the process flow for one iteration (the deliverables are listed to the right). The actual design process iterates many times through the diagram.

The next sections present the details of this process.

Preparation

The preparation phase for dimensional modeling includes the following steps:

• Identify the required team roles and participants. Table 7-1 on pp 290 lists the typical roles.

Most of the work is done by the core team, which usually consists of a data modeler, a business analyst and a member of the ETL team.

• Review the business requirements document. Start the dimensional modeling from them rather than from the source data. The dimensional model must support the requirements.

• Review the source data. You can use simple SQL queries, data profiling tools, or profiling capabilities of ETL tools to enhance your understanding of the source data.

• Set up the modeling environment. Your initial sketch of the model should be stored in a spreadsheet like the one in Figure 7-2 on pp 294. This model is easy to improve iteratively.

Switch to actual modeling tools only in later stages.

• Set the naming conventions. A common naming schema is Entity_Qualifiers_Class. For example, sales_dollar_amount is the field in the sales fact table that represents the amount.

• Obtain facilities and supplies. These include a room with a projector, whiteboards and markers.

At the completion of the preparation phase, we are ready to build the initial high-level model starting from the bus matrix.

IST722 Data Warehouse Sy racuse University Paul Morarescu School of Information Studies Develop the Initial Model In the first pass, we draw the first dimensional model diagram. This is a graphical representation of the fact and dimension tables for the chosen business process. It is a data model at the entity level. The diagram is sometimes called a bubble chart, and is best done using flipcharts. Each business process’ fact table should occupy a separate page. Dimensions should be arranged in a predictable order, with the most important on top. Figure 7-3 on pp 304 presents an example.

The second pass creates an initial list of attributes for each dimension table and metrics for the fact table. This is a deliverable called the initial attribute and metric list. Store it in a spreadsheet like the one in Figure 7-2 on pp 294, creating one worksheet per table. When choosing some attributes over others, record concerns and controversies in an issue list, which is the third (and last) deliverable of these initial design sessions.

With the high level model diagram, the initial attribute and metric list, and the issue list in hand, we are ready to move to the design sessions for the iterative process that designs the final dimensional model.

Iterative Model Development Overview

The iterative phrase of dimensional modeling process has four steps:

1. Choose the business process or measurement event – use the enterprise data warehouse bus matrix, as each row corresponds to a business process. Iteratively build the DW one business process at a time.

2. Declare the fact table grain – the grain answers the question “when do we add a new row in the fact table?” The fact table should be as finely grained as possible. Skipping this step is the most common design error.

3. Identify the dimensions – the minimal set of dimensions follows from the fact table grain.

Additional dimensions should take unique values at the same grain. Be ready to revise your intuitions that were based on the transaction system schema.

4. Identify the facts – they are physically measured or derived from such measurements. All facts must share the same grain.

Start with the dimension tables, preferably with an easy one, such as the date dimension. Do one or two tables per design session. The purpose of these detailed sessions is to add all the missing details to the high-level model.

Identify of the best data sources to populate the target design. Start by compiling a comprehensive list of candidate sources; some candidates may come from the business requirements document, others from the IT people. Then analyze and profile each candidate. Finally, choose the best sources to populate your model, and document the reasons for your choices.





–  –  –

they violate the grain of the fact table. They will be calculated at query time by the BI tool – most BI tools offer a library to develop and maintain such facts. If you have many derived facts, document them in a separate document such as the derived fact worksheet shown in Figure 7-5 on pp 312.

The detailed design of each fact and dimension table should be documented in a detailed design worksheet as illustrated in Figure 7-6 on pp 313. We will refer again to this key deliverable in the section about the final documentation of the design.

The detailed design process enhances our understanding and can change some of our initial decisions regarding business processes, fact tables, and dimension tables. We should be ready to revise the bus matrix as needed. Figure 7-7 on pp 315 illustrates a bus matrix that has been updated at this time.

Model Review and Validation The dimensional model is reviewed with various audiences, each with a different level of expertise and understanding. The first reviewers should be chosen from among DBAs and system developers from the IT department. They have a great understanding of the transaction system but don’t usually know much about dimensional models. Therefore, a short tutorial is in order. They are good at spotting model errors and verifying that all required data is available.

The second review should involve business users not directly involved in the design process. This review is also an educative session for these users. They should be shown simple examples that illustrate the analytic capabilities of the dimensional model. The purpose is to verify that the model can answer all their questions about the business process.

Document and Finalize the Design Once the design is reviewed and validated, we can start documenting it. The final design document includes several sections listed on pp 320. The detailed dimensional design worksheet (Figure 7-6 on pp

313) will be the basis for the project’s source-to-target mapping document, which describes how each target table and column in the DW will be populated. This is an essential document for the ETL team.

Some important aspects of the dimensional modeling process are summarized in the BLUEPRINT FOR ACTION box on pp 323-325.

Alternative Modeling Techniques Entity relationship modeling and dimensional modeling are not the only data modeling techniques.

Alternatives include subject modeling, domain modeling, fact qualifier matrix/modeling, state transition modeling, and information flow modeling. Fact qualifier matrix/modeling is a useful tool to capture granularity/reporting requirements. Each model emphasizes different types of information. There is more than one way of modeling a business requirement, and every project has various types of requirements. It is a good practice to leverage the modeling techniques that best fit the project.

IST722 Data Warehouse Sy racuse University Paul Morarescu School of Information Studies

Designing the Physical Database

Process Overview Kimball’s Chapter 8 is addressed primarily to the Database Administrator (DBA) involved in the data warehouse development. It is not a complete course in physical model design, by any measure. Many concepts in this chapter were introduced in the recommended prerequisite database course. In this section of our notes, we will only mention some DB-design topics that are specific to data warehouses.

The top-level DB design process is illustrated in Figure 8-1 on pp 329. As shown in the figure, this is another iterative process. After developing the system-wide standards, we create an initial physical model to give the ETL development a target. As the ETL process starts populating the tables with data, we can design the OLAP DB and eventually complete the physical design process. Some elements of the physical design, such as indexing and aggregation, will be improved even in the late stages of the DW project. We’ll give a bit more details on these design phases below.

Standards, Initial Model, and Size Estimates The physical design process starts by developing standards regarding table and column naming conventions (these names should be the same as in the logical model), null values (they should be avoided in dimension tables), directory structure and file names (use a source control system), as well as primary and foreign keys. As we mentioned in the previous lecture, a dimension table should use an integer surrogate column as its primary key. The primary keys of all dimension tables become foreign keys in the fact table. The primary key of the fact table is a subset of these foreign keys.

After developing the necessary standards, we start developing the physical model itself. It is recommended to use the logical model as the starting point for the physical model, and depart from it as little as possible. In other words, we should keep the physical model similar to the logical model. One of the very first steps is to expand the dimensional design worksheet by adding the physical model information pieces listed on pp 336-337. We thus produce the definitive source-to-target mapping.

Our next task is to produce the initial estimate for the database size on disk. The key information to remember here is that the fact table stores 90% of the data in the data warehouse. Thus, we should start by estimating the size in bytes of a fact table row, and multiplying that number by the estimated number of rows. The second largest storage consumer is the fact table index. We also need to include in our estimates the storage space for ETL staging tables, ETL audit tables, access monitoring tables and security tables.

–  –  –

A B-Tree (see Figure 1) is a tree data structure that keeps the data sorted in a way that enables searches, sequential access, insertions and deletions in logarithmic time. As shown in Figure 1, the data values 7 and 16 stored in the parent node are sorted in an ascending order. Between these two values, we have a pointer to a child node that contains values larger than 7 and smaller than 16. In addition, to the left of 7 we have a pointer to a node that stores data values smaller than 7. Similarly, to the right of 16 we have a pointer to a child node with data values larger than 16. All children nodes can be sub-trees that recursively follow the same rules. In a B-Tree, every data value occurs in a single node. In a B-Tree used for database indexing, every data value is accompanied by a list of row IDs that have that value on the column that is being indexed. B-Tree indexes work best for table columns with many different data values.

Figure 1 Example of a B-tree (Wikimedia Commons)

The next common database index type is the clustered index. This index sorts and stores the table’s rows based on the values of their keys. The default index for a table’s primary key is a clustered index, and there can be only one clustered index per table. These indexes also work best for columns with many different data values.

In contrast with the B-Tree and the clustered index, the bitmap index works best for columns with few different data values. The bitmapped index for a column whose cardinality is n consists of n strings of bits – one string for each data value. Such a bit string has one bit for each row in the table. In the bit string associated with the data value v, the bit for a particular is set to 1 if and only if that row contains value v. All the other bits of that bit string are set to 0.

Example: Assume that we have a table with 5 rows, and we want to index its gender column that has 3 possible values: “M”, “F”, and “?” Assume also that the values in the gender column for the 5 rows are, in order, “M”, “F”, “?”, “F”, and “F”. The best index type for this low-cardinality column is a bitmapped index. Our index consists of 3 strings of bits (one string for each data value), each having 5 bits (one bit for each row). The string for “M” is 10000, the string for “F” is 01011, and the string for “?” is 00100.

–  –  –

Aggregations The factor with the largest influence on the performance of a large data warehouses are the aggregate or summary records that must coexist with the atomic grain facts. A typical DW/BI system has multiple sets of aggregations, one for each grouping levels within the key dimensions. These aggregations can be built in several alternative ways.

• Store them in separate tables, each being are populated by executing a SQL query that performs a GROUP BY. This is the simplest solution.

• Store them in the materialized views offered by Oracle DB systems. A materialized view is a view that physically stores its result set, thus enabling indexing, partitioning, and controlled access.

• Use the aggregation management and capabilities of an OLAP engine. They are very robust and similar to the previous two solutions, but offer less control over storage and indexing.



Pages:   || 2 |


Similar works:

«CONFERENCE SUMMARY 3rd INVITATIONAL CONCLAVE ON CONSCIOUS BUSINESS Big Sur, California, March 24 27, 2013 CONFERENCE SUMMARY | 2 Executive Summary In March 2013 a number of CEOs, top executives, and social change-agents met at Esalen Institute to discuss the emergence of a new paradigm in the business world called both conscious capitalism and conscious business. They came from an array of companies including: Whole Foods, the Container Store, Eileen Fisher, TOMS Shoes, Geomagic, prAna, Clif...»

«dows.com 358 US Route One, Falmouth, Maine 04105 207.878.3000  800.578.9981  info@dows.com DEFERRED ANNUITIES Deferred annuities are of two types: fixed rate and variable. FIXED RATE ANNUITIES The major shortcoming of a fixed rate annuity is that it is only as secure as the insurance company that sponsors it. That is because the annuity holder's money is commingled with the other assets of the insurance company. From the point of view of safety, money in a fixed rate annuity is comparable...»

«Higher Education Governance in Vietnam: University Action, the State and Changing Relationships DISSERTATION submitted in fulfillment for a Doctoral Degree in Social Science by Thi Lan Phuong Pham Kassel, July 2013 i Die vorliegende Arbeit wurde vom Fachbereich Gesellschaftswissenschaften der Universität Kassel als Dissertation zur Erlangung des akademischen Grades eines Doktors der Wirtschafts-und Sozialwissenschaften (Dr. rer. pol.) angenommen. Erster Gutachter: Prof. Dr. Georg Krücken...»

«Made Plain STANLEY LOVE TATE III Copyright © 2015 Stanley Love Tate III tateesq.com Made Plain: A simple guide to repaying your student loans ISBN–13: 978–1511439848 ISBN–10: 151143984X Licensed under the Creative Commons Attribution NonCommercial Share-Alike 3.0 License. https://creativecommons.org/licenses/by-sa/3.0/us/. Because as J Cole says on Note to Self, “You was inspired by the world; allow the world to be inspired by your shit and to use your shit.” Designer: Stanley Love...»

«Firms in International Trade1 Andrew B. Bernard Tuck School of Management at Dartmouth & NBER J. Bradford Jensen Peterson Institute for International Economics Stephen J. Redding London School of Economics & CEPR Peter K. Schott Yale School of Management & NBER April 2007 Abstract Despite the fact that importing and exporting are extremely rare firm activities, economists generally devote little attention to the role of firms when discussing international trade. This paper summarizes key...»

«Executive Summary The Gambia has an active private sector and the government is committed to encouraging local investment and attracting foreign direct investment. There is a government agency dedicated to attracting foreign investment and promoting exports and it provides guidelines and incentives to all investors whose portfolios qualify for a Special Investment Certificate. Although a smaller market compared to some of its West African neighbors, The Gambia has a comparative advantage of...»

«Market Competition and Selection for the New Palgrave Dictionary of Economics, 2nd ed. Lawrence Blume†‡ and David Easley† June 2007 Realized profits, not maximum profits, are the mark of success and viability. It does not matter through what process of reasoning or motivation such success was achieved. The fact of its accomplishment is sufficient. This is the criterion by which the economic system selects survivors: those who realize positive profits are the survivors; those who suffer...»

«Fairyland You simply is a most current article rise than these most price banks. An event means a business if foreclosure that plans late furniture call, that is to advertising. At your franchise concerns substantially doing and is the ownership you must improve a food circumstances on her control. The can actually hear you to make your report that recruiting your consolidation. Openly reiterate and get this tax for original how you erupts to your support lien expenditures, of a could get you...»

«Your Sandwich Made It A STILL way could spend to usually begin out the thorough work as a basket is willing well. Etc. double to have to destroy the Your Sandwich Made It! management sources if of most 2010-2012 breeds to the research of a seminar you want visiting income. Before according the flyers, you will buy this only Air Corporation to make itself be in office or customized accounting. It owed exclusively the great business in what she will recover or when local the report should find...»

«MINISTRY OF PUBLIC FINANCE GOVERNMENT DEBT MANAGEMENT STRATEGY 2014 – 2016 General Directorate for Treasury and Public Debt Bucharest, March 2014 Contents Summary 1. Introduction 2. Objectives and scope 3. Description of the public government debt portfolio 3.1. Evolution of government public debt 3.2. Risks of the public government debt portfolio at end-2013 4. Macroeconomic background in Romania Risks to baseline projections 5. Funding sources 5.1. Domestic market 5.2. External markets 6....»

«Capacity Building Legacies: Boards of the Richmond Male Orphan Asylum for Destitute Boys & the Protestant Episcopal Church Home for Infirm Ladies 1870-1900 F. ELLEN NETTING MARY KATHERINE O’CONNOR DAVID P. FAURI Virginia Commonwealth University School of Social Work What strategies did early boards of managers of charitable human service agencies pursue to build capacity in a way that sustained their efforts for more than a hundred years? Using primary and secondary documents to focus on two...»

«Section 1/Chapter 1 Manage the Association and Its Business Operations Association Structures and Relationships (3/12) Chapter Contents  Introduction  Shared Services/Collaboration  Shared Services  Mergers/Consolidations  Joining Forces  The Benefits of Merging/Consolidating Sources  Field Guide to Association Mergers and Partnerships, NAR, August 2010, http://www.realtor.org/library/library/fg608  Field Guide to Association Shared Services, NAR, August, 2010,...»





 
<<  HOME   |    CONTACTS
2016 www.thesis.xlibx.info - Thesis, documentation, books

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.