Software Architecture Books Summary And Highlights -- Part 9 Data And Database

 

Data And Database

Highlights

When to break database into multiple one?

  • Change control
    • How many services are impacted by a database table change? if one table is depended by too many services, then it is hard to decompose that table
    • Better to refactor database schema to bounded context first so that one table can only be directly accessed by only one service
  • Connection management
    • breaking into multiple services may increase number of connections to db.
    • One solution is to assign connection quota to each service either evenly or based on different services
  • Scalability, Fault tolerant
    • breaking into multiple database can help increase scalability and fault tolerant
  • Database type optimization
    • Can put some data type in a more suitable type of database

SAH Ch 6 Pulling Apart Operational Data


How to split table ownership among multiple services

Several scenarios:

  1. single ownership scenario: one table is written by only one service; straight forward…
  2. common ownership: one table is written by all services.
    1. Create a single wrapper service above that table
  3. Joint ownership: one table is written by some but not all services; 4 solutions
    1. Split db for each service
    2. allow multiple services to access same db; define a domain db and let multiple services own that
    3. Delegate all writes and read to one service. That service can be picked either by
      1. which service has closer relationship to the data
      2. which service has higher performance requirements
    4. Combine multiple services into one; may affect scalability

SAH Ch 9 Data Ownership and Distributed Transactions


When to merge multiple database into single one?

  • Data relationships Are there foreign keys, triggers, or views that form close relationships between the tables? these are important for data consistency
  • Database transactions Is a single transactional unit of work necessary to ensure data integrity and consistency?

SAH Ch 6 Pulling Apart Operational Data


Distributed Data Access Pattern

  • Access data through interservice communication
    • Pro: simple
    • Cons: low scalability, throughput, availability
  • Replicate Database’s column data to each service’s own database
    • Pros:
      • good data access performance and scalability, fault tolerance;
      • no service dependency
    • Cons: data consistency, ownership and synchronization challenge
  • Access data through cache
    • Pros:
      • good performance and consistency;
      • Ownership is preserved
    • Cons:
      • hard to configure
      • Not scalable for high data volumes or high update rates
  • Multiple services share same database
    • Pros:
      • good performance and consistency;
      • no service dependency
    • Cons: bounded context, data ownership and data access security issues are challenging

SAH Ch 10 Distributed Data Access


Managing Analytical Data

Data warehouse

cons:

  • Integration brittleness
    • Changing production db schema will entails changes of transformation and import logic also
  • Extreme partitioning of domain knowledge
    • Couple all domain together and Architects, developers, DBAs, and data scientists must all coordinate on data changes and evolution, forcing tight coupling between vastly different parts of the ecosystem.
  • Complexity
  • Synchronization creates bottlenecks
  • Limited functionality for intended purpose
    • most data warehouses failed because they didn’t deliver business value commensurate to the effort required to create and maintain the warehouse.

Data Lake

do no transformations, allowing business users access to analytical data in its natural format, which typically required transformation and massaging for their purpose.

Load and transform instead of transform and load

Cons

  • Difficulty in discovery of proper assets
  • Still technically partitioned instead of partition based on domain

Data Mesh

principles:

  1. Domain ownership of data
    1. Data is owned and shared by the domains that are most intimately familiar with the data:
  2. Data as a product
    1. puts in place the organizational roles and success metrics necessary to ensure that domains provide their data in a way that delights the experience of data consumers across the organization.
  3. Self-serve data platform
    1. make developer life easier
  4. Computational federated governance
    1. organization-wide governance requirements—such as compliance, security, privacy, and quality of data, as well as interoperability of data products—are met consistently across all domains.

SAH Ch 14 Managing Analytical Data


Reporting

Several ways:

  1. Export data for reporting through api, e.g. batch api, or let api write to a file
  2. separate data sync code but owned by domain team. good decoupling
  3. send data change event to reporting service. hard to scale
  4. build reporting data based on backup data

MSV Ch 5 Splitting the monolith


Related Chapters

SAH Ch 6 Pulling Apart Operational Data

SAH Ch 9 Data Ownership and Distributed Transactions

SAH Ch 10 Distributed Data Access

SAH Ch 14 Managing Analytical Data

MSV Ch 5 Splitting the monolith

Popular posts from this blog

Does Free Consciousness exist ?

View and Build Government Organization like a Software

Software Architecture Books Summary And Highlights -- Part 1 Goal, Introduction And Index