written 2.6 years ago by |
Distributed Design
- Design of distributed programs involves where to place data and programs.
- Therefore, we could talk about the design of where application programs are placed and where DBMSS are placed.
- However, that is not of interest to us.
- We are really interested in organizing data
- How to partition the data
- Where to place data partitions
- The object is to make data access fast and efficient
- Through locality of reference
- Place the data that users will use most often closest to them
Three dimensions
- The organization of distributed systems can be investigated along three dimensions :
- Level of sharing
- Behavior of access patterns
- Level of knowledge on access pattern
Level of sharing
- No (program or data) sharing
- Not really done in sophisticated data environments
Data sharing only
- Programs are replicated where necessary
Program and data sharing
- Programs and data are not replicated
We will examine architectures that support the last two "level of sharing" options
Access Patterns
To understand which users need which data, one must understand user and application access patterns
- What type of data do which types of users need?
- Where are the users located?
Static access patterns
- Not very usual
- Straightforward to design and manage distributed data environment
Dynamic access patterns
- More likely - users do not always have the same needs over time
- More difficult to anticipate
- Difficult to design and manage distributed data environment \
Our approach is to address static access patterns only
The static approach can serve as a basis for more complex dynamic approaches
Level of Knowledge on Access Pattern
- How much do we know about how users will access the data?
Again, knowledge of access patterns is a range
- No knowledge (hard to know how to distribute data)
- Partial knowledge
- Complete knowledge (helps us determine ideal placement of data)
Partial knowledge - to some extent - is more usual case
- We have to do the best job initially
- Will have to observe usage patterns over time to get a better idea of data access patterns
These issues contribute to the design and placement of distributed data
Distribution Design
Top-down
mostly in designing systems from scratch
mostly in homogeneous systems
Bottom-up
- when the databases already exist at a number of sites
Top Down Design
Conceptual design of the data is the ER model of the whole enterprise
- Must anticipate new views/usages -
- Must describe semantics of the data as used in the domain/enterprise
This is almost identical to typical DB design
- However, we are concerned with Distribution Design
- We need to place tables "geographically" on the network
- We also need to fragment tables
Bottom Up
Top-down design is the choice when you have the liberty of starting from scratch
- Unfortunately, this is not usually the case
- Some element of bottom-up design is more common
Bottom-up design is integrating independent/semi-independent schemas into a Global Conceptual Schema (GCS)
- Must deal with schema mapping issues
- May deal with heterogeneous integration issues