Daniel

Application Architecture Guide

Application architecture fundamentals

| Traditional on-premises | Modern cloud | | ———– | ———– | | Monolithic | Decomposed | | Designed for predictable scalability | Designed for elastic scale | | Relational database | Polyglot persistence(mix of storage technologies) | | Synchronized processing| Asynchronous processing | | Design to avoid failure(MTBF) | Design for failure(MTTR) | | Occasional large updates| Frequent small updates | | Maunal management| Automated self-management | | Snowflake servers| Immutable infrastructure | MTBF & MTTR

Architecture styles

Overview

N-tier is a traditional architecture for enterprise applications. Dependencies are managed by dividing the application into layers that perform logical functions, such as presentation, business logic, and data access. N-tier is a natural fit for migrating existing applications that already use a layered architecture. For this reason, N-tier is most often seen in infrastructure as a service(IaaS) solutions, or application that use a mix of IaaS and managed services.

For a purely PaaS solution, consider a Web-Queue-Worker architecture. In this style, the application has a web front end that handles HTTP requests and a back-end worker that performs CPU-intensive tasks or long-running operations. The front end communicates to the worker through an asynchronous message queue.

A microservices application is composed of many small, independent services. Each service implements a single business capability. Services are loosely coupled, communicating through API contracts.

Event-Driven Architectures use a publish-subscribe(pub-sub) model, where producers publish events, and consumers subscribe to them. The producers are independent from the consumers, and consumers are independent from each other. Consider an event-driven architecture for applications that ingest and process a large volume of data with very low latency, such as IoT solutions. The style is also useful when different subsystems must perform different types of processing on the same event data.

Big data divides a very large dataset into chunks, performing parallel processing across the entire set, for analysis and reporting. Big Compute, also called high-performance compute(HPC), makes parallel computations across a large number(thousands) of cores. Domains include simulations, modeling, and 3-D rendering.

Architecture style Dependency management Domain type
N-tier Horizontal tiers divided by subnet Traditional business domain. Frequently of updates is low.
Web-Queue-Worker Front and backend jobs, decoupled by async messaging Relatively simple domain with some resource intensive tasks.
Miscroservices Vertially(functionally) decomposed services that call each other through APIs Complicated domain. Frequent updates.
Event-driven architecture Producer/consumer. Independent view per sub-system. IoT and real-time systems
Big data Divide a huge dataset into small chunks. Parallel processing on local datasets. Batch and real-time data analysis. Predictive analysis using ML.
Big compute Data allocation to thousand of cores. Compute intensive domains such as simulation.

Here are some of the types of challenges to consider when selecting an architecture style:

Big compute

When to use this architecture

Big data

Most big data architectures include some or all of the following components:

When to use this architecture

Event-driven architecture style

An event driven architecture can use a pub/sub model or an event stream model.

When to use this architecture

Microservices architecture style

A bounded context is a natural division within a business and provides an explicit boundary within which a domain model exists.

What are miscroservices?

Besides for the services themselves, some other components appear in a typical microservices architecture:

Benefits

N-tier architecture style

N-tier architectures are typically implemented as infrastructure-as-service(IaaS) applications, with each tier running on a separate set of VMs.

Web-Queue-Worker architecture style

When to use this architecture

Design principles for Azure applications

IaaS is like having a box of parts. You can build anything, but you have to assemble it yourself. PaaS options are easier to configure and administer. You don’t need to provision VMs, set up VNets, manage pathces and updates, and all of the other overhead associated with running software on a VM.

Your application may have specific requirements that make an IaaS approach more suitable. However, even if your application is based on IaaS, look for places where it may be natural to incorporate PaaS options.

Public facing services should expose a RESTful API over HTTP. Backend services might use an RPC-style messaging protocol for performance reasons.

When services expose well-defined APIs, you can develop and test against those APIs. That way, you can develop and test an individual service without spinning up all of its dependent services.(Of course, you would still perform integration and load testing against the real services.)

Function requirements let you judge whether the application does the right thing. Nonfunctional requirements let you judge whether the application does those things well.

Technology choices

Choose a compute service

Definitions:

Infrastructure-as-a-Service(IaaS) lets you provision individual VMs along with the associated networking and storage components. Then you deploy whatever software and applications you want onto those VMs.

Platform-as-a-Service(PaaS) provides a managed hosting environment, where you can deploy your application without needing to manage VMs or networking resources.

Functions-as-a-Service(FaaS) goes even further in removing the need to worry about the hosting environment. In a FaaS model, you simply deploy your code and the service automatically runs it.

For a microservices architecture, two approaches are especially popular:

Choose a container option

Decision tree for bare-metal Kubernetes at the edge

Choose a data store

Relational database is very useful when strong consistency guarantees are important - where all changes are atomic, and transactions always leave the data in a consistent state. However, an RDBMS generally can’t scale out horizontally without sharding the data in some way. Also, the data in an RDBMS must be normalized, which isn’t appropriate for every data set.

Relational database Workload

Relational Data type

A key/value store associates each data value with a unique key. Most key/value stores only support simple query, insert, and delete operations. To modify a value(either partially or completely), an application must overwrite the existing data for the entire value. In most implementations, reading or writing a single value is an atomic operation.

Key/value store Workload

Key/value store Data type

A document database stores a collection of documents, where each document consist of named fields and data. The data can be simple values or complex elements such as lists and child collections. Document are retrived by unique keys.

Document database Workload

Document databse Data type

A graph database stores two types of infromation, nodes and edges. Edges specify relationships between nodes. Nodes and edges can have properties that provide information about that node or edge, similar to columns in a table. Edges can also have a direction indicating the nature of the relationship.

Graph database Workload

Graph database Data type

Data analytics stores provide massively parallel solutions for ingesting, storing, and analyzing data. The data is distributed across multiple servers to maximize scalability. Large data file formats such as delimiter files(CSV), parquet, and ORC are widely used in data analytics.

Data analytics Workload

A column-family database organizes data into rows and columns. In its simplest form, a column family database can appear very similar to a relational database, at least conceptually. The real power of a column-family database lies in its denormalized approach to structuring sparse data. You can think of a column-family database as holding tabular data with rows and columns, but the columns are divided into groups known as column families. Each column family holds a set of columns that are logically related together and are typically retrived or manipulated as a unit. Other data that is accessed separately can be stored in separate column families. Within a column family, new columns can be added dynamically, and rows can be sparse(that is, a row doesn’t need to have a value for every column). Unlike a key/value store or a document database, most column-family database store data in key order, rather than by computing a hash. Many implementations allow you to create indexes over specific columns in a column-family. Indexes let you retrive data by columns value, rather than row key.

Column-family database Workload

Column-family database Data type

A search engine database allows applications to search for information held in external data stores. A search engine database can index massive volumes of data and provide near real-time access to these indexes.

Search engine database Workload

Search engine database Data type

Time series data is a set of values organized by time. Time series database typically collect large amounts of data in real time from a large number of sources. Update are rare, and deletes are often done as bulk operations. Although the records written to a time-series database are generally small, there are often a large number of records, and total data size can grow rapidly.

Time series database Workload

Time series database Data type

Object storage is optimized for storing and retriving large binary objects(images, files, videos and audio streams, large application data objects and documents, virtual machine disk images). Large data files are also popularly used in this model, for example, delimiter file(CSV), parquet, and ORC. Object stores can manage extremely large amounts of unstructured data.

Object storage Workload

Object storage Data type

Sometimes, using simple flat files can be the most effective means of storing and retriving information. Using file shares enables files to be accessed across a network. Given appropriate security and concurrent access control mechanisms, sharing data in this way can enable distributed services to provide highly scalable data access for performing basic, low-level operations such as simple read and write requests.

Shared files Workload

Shared files Data type

Critieria for choosing a data store

General considerations

Functional requirements

  • Data format. What type of data are you intending to store? Common types include transactional data, JSON objects, telemetry data, search indexes, or flat files.
  • Data size. How large are the entities you need to store? Will these entities need to be maintained as a single document, or can they be split across multiple documents, tables, collections, and so forth?
  • Scale and structure. What is the overall amount of storage capacity you need? Do you anticipate partitioning your data?
  • Data relationships. Will you data need to support one-to-many or many-to-many relationships? Are relationships themselves an important part of the data? Will you need to join or otherwise combine data from within the same dataset, or from external datasets?
  • Consistency model. How important is it for updates made in one node to appear in other nodes, before further changes can be made? Can you accept eventual consistency? Do you need ACID guarantees for transactions?
  • Schema flexibility. What kind of schemas will you apply to your data? Will you use a fixed schema, a schema-on-write approach, or a schema-on-read approach?
  • Concurrency. What kind of concurrency mechanism do you want to use when updating and synchronizing data? Will the application perform many updates that could potentially conflict. If so, you may require record locking and pessimistic concurrency control. Alternatively, can you support optimistic concurrency control? If so, is simple timestamp-based concurrency control enough, or do you need the added functionality of multi-version concurrency control?
  • Data movement. Will you solution need to perform ETL tasks to move data to other stores or data warehouses?
  • Data lifecycle. Is the data write-once, read-many? Can it be moved into cool or cold storage?
  • Other supported features. Do you need any other specific features, such as schema validation, aggregation, indexing, full-text search, MapReduce, or other query capabilities?

Non-functional requirements

  • Performance and scalability. What are your data performance requirements? Do you have specific requirements for data ingestion rates and data processing rates? What are the acceptable response times for querying and aggregation of data once ingested? How large will you need the data store to scale up? Is your workload more read-heavy or write-heavy?
  • Reliablility. What overall SLA do you need to support? What level of fault-tolerance do you need to provide for data consumers? What kind of backup and restore capabilities do you need?
  • Replication. Will your data need to be distributed among multiple replicas or regions? What kind of data replication capabilities do you require?
  • Limits. Will the limits of a particular data store support your requirements for scale, number of connections, and throughput?

Management and cost

  • Manage service. When possible, use a managed data service, unless you require specific capabilities that can only be found in an IaaS-hosted data store.
  • Region availability. For managed services, is the service available in all regions? Does your solution need to be hosted in certain regions?
  • Portability. Will your data need to be migrated to on-premises, external datacenters, or other cloud hosting environments?
  • Licensing. Do you have a preference of a proprietary versus OSS license type? Are there any other external restrictions on what type of license you can use?
  • Overall cost. What is the overall cost of using the service within your solution? How many instances will need to run, to support your uptime and throughput requirements? Consider operations costs in this calculation. One reason to prefer managed services is the reduced operational cost.
  • Cost effectiveness. Can you partition your data, to store it more cost effectively? For example, can you move large objects out of an expensive relational database into an object store?

Security

  • Security. What type of encryption do you require? Do you need encryption at rest? What authentication mechanism do you want to use to connect to your data?
  • Auditing. What kind of audit log do you need to generate?
  • Networking requirements. Do you need to restrict or otherwise manage access to your data from other network resources? Does data need to be accessible only from inside the cloud environment? Does the data need to be accessible from specific IP addresses or subnet? Does it need to be accessible from applications or services hosted on-premises or in other external datacenters?

Devops

  • Skill set. Are there particular programming languages, operating systems, or other technology that your team is particular adept at using? Are there others that would be difficult for your team to work with?
  • Clients. Is there good client support for your development languages?

The databases that a business uses to store all its transactions and records are called online transaction processing(OLTP) databases. These databases usually have records that are entered one at a time. Often they contain a great deal of information that is valuable to the organization. The databases that are used for OLTP, however, were not designed for analysis. Therefore, retrieving answers from these databases is costly in terms of time and effort. OLAP systems were designed to help extract this business intelligence information from the data in a highly performant way. This is because OLAP databases are optmized for heavey read, low write workloads.

Transactional data is information that tracks the interactions related to an organization’s activities. These interactions are typically busniess transactions, such as payments received from customers, payments made to supplies, products moving through inventory, orders taken, or services delivered. Transactional events, which represent the transactions themselves, typically contain a time dimension, some numerical values, and references to other data.

Transactions typically need to be atomic and consistent. Atomicity means that an entire transaction always succeeds or fails as one unit of work, and is never left in a half-completed state. If a transaction cannot be completed, the database system must roll back any steps that were already done as part of that transaction. In a traditional RDBMS, this rollback happens automatically if a transaction cannot be completed. Consistency means that transactions always leave the data in a valid state.

A non-relational database is a database that does not use the tabular schema of rows and columns found in most traditional database systems. Instead, non-relational databases use a storage model that is optimized for the specific requirements of the type of data being stored.

Choose a network service

Choose a messaging service

Messages can be classified into two main categories: If the producer expects an action from the consumer, that message is a command. If the message informs the consumer that an action has taken place, then the message is an event.

Best practices for cloud application

API design

A resource in REST doesn’t have to be based on a single physical data item. For example, an order resource might be implemented internally as several tables in a relational database, but presented to the client as a single entity. Avoiding creating APIs that simply mirror the internal structure of a database. The purpose of REST is to model entities and the operations that an application can perform on those entities. A client should not be exposed to the internal implementation.

Considering the relationships between different types of resources and how you might expose these associations. For example, the /customer/5/orders might represent all of the orders for customer 5. You could also go in the order direction, and represent the association from an order back to a customer with a URL such as /orders/99/customer. However, extending this model too far can become cumbersome to implement. A better solution is to provide navigable links to assocated resources in the body of the HTTP response message.

Avoiding requiring resource URIs more complex than collection/item/collection.

Avoiding introducing dependencies between the web API and the underlying data source. For example, if your data is stored in a relational database, the web API doesn’t need to expose each table as a collection of resources. In fact, that’s probably a poor design. Instead, think of the web API as an abstraction of the database. If necessary, introduce a mapping layer between the database and the web API. That way, client applications are isolated from changes to the underlying database schema.

PUT requests must be idempotent. If a client submits the same PUT request multiple times, the results should always be the same(the same resource will be modified with the same values). POST and PATCH requests are not guaranteed to be idempotent.

API implementation

The HTTP protocol provides the chunked transfer encoding mechanism to stream large data objects back to a client. When the client sends an HTTP GET request for a large object, the web API can send the reply back in piecemeal chunks over an HTTP connection. The length of the data in the reply may not be known initially(it might be generated), so the server hosting the web API should send a response message with each chunk that specifies the Transfer-Encoding: Chunked header rather than a Content-Length header. The client application can receive each chunk in turn to build up the complete response. The data transfer completes when the server sends back a final chunk with zero size.

It is important to ensure that the web API is implemented to maintain responsiveness under a heavy load, to be scalable to support a highly varing workload, and to guarantee availability for clients that perform business-critical operations.

Autoscaling

If the solution implements a long-running task, design this task to support both scaling out and scaling in. Without due care, such a task could prevent an instance of a process from being shut down cleanly when the system scales in, or it could lose data if the process is forcibly terminated. Ideally, refactor a long-running task and break up the processing that it performs into smaller, discrete chunks.

Background jobs

Ideally, background tasks are “fire and forget” operations, and their execution progress has no impact on the UI or the calling process.

Caching

If an application choose not to cache this data on the basis that the cached information will nearly always be outdated, then the same consideration could be true when storing and retriving this information from the data store. In the time it takes to save and fetch this data, it might have changed. In a situation such as this, consider the benefits of storing the dynamic information directly in the cache instead of in the persistent data store. If the data is noncritical and does not require auditing, then it doesn’t matter if the occasional change is lost.

Consider implementing a local, private cache in each instance of an application, together with the shared cache that all application instances access. When the application retrives an item, it can check first in its local cache, then in the shared cache, and finally in the original data store. The local cache can be populated using the data in either the shared cache, or in the database if the shared cache is unavailable. This approach requires careful configuration to prevent the local cache from becoming too stale with respect to the shared cache. However, the local cache acts as a buffer if the shared cache is unreachable.

If a shared cache is large, it might be beneficial to partition the cached data across nodes to reduce the chances of contention and improve scalability.

Content Delivery Network

CDNs are typically used to deliver static content such as images, style sheets, documents, client-side script, and HTML pages. The major advantages of using a CDN are lower latency and faster delivery of content to users, regardless of their geographical location in relation to the datacenter where the application is hosted. CDNs can also help to reduce load on a web application, because the application does not have to service requests for the content that is hosted in the CDN.

Data paritioning

Vertical partitioning operates at the entity level within a data store, partially normalizing an entity to break it down from a wide item to a set of narrow items. It is ideally suited for column-oriented data stores such as HBase and Cassandra.

Use business requirements to determine the critical queries that must always perform quickly.

If cross-partition joins are necessary, run parallel queries over the partitions and join the data within the application.

Message encoding considerations

The producer of the message defines the message shape based on the business logic and the information it wants to send to the consumer(s).

As business requirements change, the shape is expected to change, and the schema will evolve. Versioning allows the producer to indicate schema updates that might include new features.

Monitoring and diagnostics

A more advanced system might include a predictive element that performs a cold analysis over recent and current workloads.

Health monitoring provides an immediate view of the current health of the system, availability monitoring is concerned with tracking the availability of the system and its components to generate statistics about the uptime of the system.

You can calculate the percentage availability of a service over a period of time by using the following formula:

%Availability = ((Total Time - Total Downtime) / Total time) * 100

This is useful for SLA purposes.

Performance tuning and antipatterns

Performance is frequently measured in terms of throughput, response time, and availability.

Performance targets should explicitly include a target load. Also, not all users will receive exactly the same level of performance, even when accessing the system simultanously and performing the same work.

Performance antipatterns

  1. Busy Database
    Many database systems can run code. Examples include stored procedures and triggers. Often, it’s more efficient to perform this processing close to the data, rather than transmitting the data to a client application for processing. However, overusing these features can hurt performance.
  2. Busy Front End Performing asynchronous work on a large number of background threads can starve other concurrent foreground tasks of resources, deceasing response times to unacceptable levels. This problem typically occurs when an application is developed as monolithic piece of code, with all of the business logic combined into a single tier shared with the presentation layer.
  3. Chatty I/O The cumulative effect of a large number of I/O requests can have a significant impact on performance and responsiveness.
  4. Extraneous Fetching More than needed data is retrived for a business operation, often resulting in unnecessary I/O overhead and reduced responsiveness. e.g. For each request, the database returned 80,503 bytes, but the response to the client only contained 19,855 bytes, about 25% of the size of the database response.
  5. Improper Instantiation
    Sometimes new instances of a class are continually created, when it si meant to be created once and then shared.
    object thats can’t be shared -> not thread-safe -> new object.
    singleton object -> thread-safe -> object pool
  6. Monolithic Persistence
    Putting all of an application’s data into a single data store can hurt performance, either because it leads to resource contention, or because the data store is not a good fit for some of the data.
  7. No Caching
    When a cloud application that handles concurrent requests, repeatedly fetches the same data. This can reduce performance and scalability.
  8. Noisy Neighbor
    The problem occurs when one tenant’s performance is degraded because of the activities of another tenant.
  9. Retry Storm
    When a service is unavailable or busy, having clients retry their connections too frequently can cause the service to struggle to recover, and can make the problem worse.
  10. Synchronous I/O
    Blocking the calling thread while I/O completes can reduce performance and affect vertical scalability.

Architecture for startup

Kent Beck describes a three-stage process of software product innovation. Those stages are explore, expand, and extract.

In a product’s initial explore stage, you need to optimize deployment for speed, cost, and optionality. Optionality refers to how fast you can change directions within a given architecture.
A business in the expand and extract phases of product development might use a service-oriented or microservice architecture. This type of deployment architecture is rarely right for a startup that hasn’t yet found product/market fit or commercial traction.
For a core startup stack, a simple monolithic design is best. This design limits the time spent managing infrastructure, while providing ample ability to scale as the startup wins more customers.

Bugs aren’t caused by complexity, but a complex stack makes it easier to ship bugs. Not all sophisticated architectures are a waste of energy, but they waste your resources if you haven’t yet found product/market fit. Your first startup stack should be simple and get out of your way, so you can concentrate on product development.

The following simple diagram shows the recommended core startup stack. These components are enough to get your product off the ground and into the hands of your customers. For 80 percent of startups, this stack is all you need to test the basic hypotheses built into your product. Startups working in machine learning, internet of things(IoT), or highly regulated enviroments might require more components.

Design pattern

Data management patterns

Design and implementation

Other