Data Management CHAPTER 6 121
Query Processing
Query processing in Parallel Data Warehouse is more complex than in an SMP data ware-
house because processing must manage high availability, parallelization, and data movement
between nodes. In general, Parallel Data Warehouse’s control node follows these steps to
process a query (shown in Figure 6-5):
1. Parse the SQL statement.
2. Validate and authorize the objects.
3. Build a distributed execution plan.
4. Run the execution plan.
5. Aggregate query results.
6. Send results to the client application.
Client
Management
Compute
Control
Compute
Landing Zone
Compute
Backup
Compute
Appliance
User query
Query results
Aggregate query results Compute nodes
process query plan
operations in parallel
FIGURE 6-5 Query processing steps
A query with a simple join on columns of replicated tables or distribution columns of dis-
tributed tables does not require the transfer of data between compute nodes before execut-
ing the query. By contrast, a more complex join that includes a nondistribution column of a
distributed table does require Parallel Data Warehouse to copy data among the distributions
before executing the query.
Data Load Processing
The design of data load processing in Parallel Data Warehouse takes full advantage of the
parallel architecture to move data to the compute nodes. You have several options for load-
ing data into your data warehouse. You can use your ETL process to copy les to the Parallel
Comments to this Manuals