Microsoft SQL Server 2008 R2 Specifications download pdf (Page 136)

116 CHAPTER 6 Scalable Data Warehousing

You design the data layout on the appliance to avoid or minimize data movement for par-

allel queries by using either a replicated or a distributed strategy for storage. When planning

which strategy to implement, you consider the types of joins that the parallel queries require.

Some tables require a replicated strategy, whereas others require a distributed strategy.

Replicated Strategy

For best performance, you can add small tables—such as dimension tables in a star schema—

to Parallel Data Warehouse by using a replicated strategy. Parallel Data Warehouse makes

a copy of the table on each compute node, as shown in Figure 6-3. You then perform the

initial load of the table, followed by any subsequent inserts, updates, or deletes, as if you were

working with a single table, without the need to manage each copy of the table. Parallel Data

Warehouse handles all changes to the table for you. When a query performs a join on a repli-

cated dimension, Parallel Data Warehouse joins the dimension to the portion of the fact table

that exists on the same compute node. All compute nodes run the query in parallel and can

nd data very quickly because the complete dimension table is on each compute node.

Table

Compute nodes

All table rows are copied

to each compute node

Replicated table

FIGURE 6-3 Replicated strategy

Distributed Strategy

One of the keys to performance in an MPP architecture is the distribution of large tables

across multiple nodes, as shown in Figure 6-4. To distribute a fact table, you simply select a

column from the table to use as the distribution column, and when data is loaded into the

table, Parallel Data Warehouse automatically spreads the rows across all of the compute

1 2 ... 131 132 133 134 135 136 137 138 139 140 141 ... 235 236

No comments

Microsoft SQL Server 2008 R2 Specifications Page 136