Conceitos para entrevistas de design de sistemas

fonte: https://developer.mozilla.org/en-US/docs/Web/HTTP/Messages

O HTTP também vem com alguns “verbos” ou “métodos”, que são comandos que dão uma idéia de que tipo de operação deve ser executada. Por exemplo, os métodos HTTP comuns são “GET”, “POST”, “PUT”, “DELETE” e “PATCH”, mas existem mais. Na imagem acima, procure o verbo HTTP na linha de partida.

Seção 2: armazenamento, latência e taxa de transferência

Armazenamento

Armazenamento é manter informações. Qualquer aplicativo, sistema ou serviço que você programa precisará armazenar e recuperar dados, e esses são os dois objetivos fundamentais do armazenamento.

Mas não se trata apenas de armazenar dados – também de buscá-los. Usamos um banco de dados para conseguir isso. Um banco de dados é uma camada de software que nos ajuda a armazenar e recuperar dados.

Esses dois tipos principais de operações, armazenamento e recuperação, também são chamados de ‘definir, obter’, ‘armazenar, buscar’, ‘escrever, ler’ e assim por diante. Para interagir com o armazenamento, você precisará passar por algum tipo de servidor que atue como intermediário para realizar essas operações fundamentais.

A palavra “armazenamento” pode às vezes nos levar a pensar sobre isso em termos físicos. Se eu “guardar” minha bicicleta no galpão, posso esperar que ela esteja lá quando eu abrir o galpão em seguida.

Mas isso nem sempre acontece no mundo da computação. Em geral, o armazenamento pode ser de dois tipos: armazenamento “Memória” e armazenamento “Disco”.

Dessas duas, o armazenamento em disco tende a ser mais robusto e “permanente” (não verdadeiramente permanente, portanto, geralmente usamos a palavra armazenamento “persistente”). O armazenamento em disco é um armazenamento persistente. Isso significa que, quando você salvar algo no disco e desligar ou reiniciar o servidor, esses dados “persistirão”. Não será perdido.

No entanto, se você deixar os dados na “Memória”, isso geralmente será apagado quando você desligar ou reiniciar ou perder energia.

O computador que você usa todos os dias possui esses dois tipos de armazenamento. O seu disco rígido está
armazenamento em disco “persistente” e sua memória RAM é transitória.

Nos servidores, se os dados que você está acompanhando são úteis apenas durante uma sessão desse servidor, faz sentido mantê-los na Memória. Isso é muito mais rápido e mais barato do que gravar coisas em um banco de dados persistente.

Por exemplo, uma única sessão pode significar quando um usuário está conectado e usando seu site. Após o logout, talvez você não precise reter os bits dos dados coletados durante a sessão.

Mas o que você deseja manter (como o histórico do carrinho de compras), você armazenará um armazenamento em disco persistente. Dessa forma, você pode acessar esses dados na próxima vez em que o usuário efetuar login e eles terão uma experiência perfeita.

Ok, então isso parece bastante simples e básico, e deve ser. Esta é uma cartilha. O armazenamento pode ficar muito complexo. Se você der uma olhada na gama de produtos e soluções de armazenamento, sua cabeça vai girar.

Isso ocorre porque diferentes casos de uso requerem diferentes tipos de armazenamento. A chave para escolher os tipos de armazenamento certos para o seu sistema depende de muitos fatores e das necessidades do seu aplicativo, e de como os usuários interagem com ele. Outros fatores incluem:

a forma (estrutura) dos seus dados ou
que tipo de disponibilidade é necessária (que nível de tempo de inatividade é bom para o seu armazenamento) ou
escalabilidade (com que rapidez você precisa ler e gravar dados e essas leituras e gravações acontecem simultaneamente (simultaneamente) ou sequencialmente) etc.) ou
consistência – se você se protege contra o tempo de inatividade usando armazenamento distribuído, qual é a consistência dos dados em suas lojas?

Essas perguntas e conclusões exigem que você considere suas compensações cuidadosamente. A consistência é mais importante que a velocidade? Você precisa do banco de dados para atender a milhões de operações por minuto ou apenas para atualizações noturnas? Abordarei esses conceitos nas seções mais adiante, portanto, não se preocupe se você não tem idéia do que são.

Latência

“Latência” e “Taxa de transferência” são termos que você ouvirá muito quando começar a ter mais experiência com o design de sistemas para dar suporte ao front end do seu aplicativo. Eles são muito fundamentais para a experiência e o desempenho do seu aplicativo e do sistema como um todo. Geralmente, existe uma tendência a usar esses termos em um sentido mais amplo do que o pretendido ou fora do contexto, mas vamos corrigir isso.

Latência é simplesmente a medida de uma duração. Que duração? A duração de uma ação para concluir algo ou produzir um resultado. Por exemplo: para mover dados de um lugar no sistema para outro. Você pode pensar nisso como um atraso, ou simplesmente o tempo necessário para mover os dados.

A latência mais comumente entendida é a solicitação de rede “ida e volta” – quanto tempo leva para o seu site front-end (cliente) enviar uma consulta ao servidor e obter uma resposta do servidor.

Ao carregar um site, você deseja que seja o mais rápido e suave possível. Em outras palavras, você quer baixo latência. Pesquisas rápidas significam baixa latência. Portanto, encontrar um valor em uma matriz de elementos é mais lento (latência mais alta, porque você precisa iterar sobre cada elemento da matriz para encontrar o que deseja) do que encontrar um valor em uma tabela de hash (latência mais baixa, porque você simplesmente olha os dados em tempo “constante”, usando a chave. Nenhuma iteração é necessária.).

Da mesma forma, ler da memória é muito mais rápido que ler de um disco (leia mais aqui) Mas ambos têm latência e suas necessidades determinarão que tipo de armazenamento você escolhe para quais dados.

Nesse sentido, a latência é o inverso da velocidade. Você quer velocidades mais altas e latência menor. A velocidade (especialmente em chamadas de rede como via HTTP) é determinada também pela distância. Assim, latência de Londres para outra cidade, será impactado pela distância de Londres.

Como você pode imaginar, você deseja projetar um sistema para evitar o ping de servidores distantes, mas o armazenamento de coisas na memória pode não ser viável para o seu sistema. Essas são as vantagens e desvantagens que tornam o design do sistema complexo, desafiador e extremamente interessante!

Por exemplo, sites que mostram notícias podem preferir tempo de atividade e disponibilidade à velocidade de carregamento, enquanto jogos multiplayer on-line podem exigir disponibilidade e latência super baixa. Esses requisitos determinarão o projeto e o investimento em infraestrutura para dar suporte aos requisitos especiais do sistema.

Taxa de transferência

Isso pode ser entendido como a capacidade máxima de uma máquina ou sistema. É frequentemente usado em fábricas para calcular quanto trabalho uma linha de montagem pode realizar em uma hora ou dia, ou em alguma outra unidade de medida de tempo.

Por exemplo, uma linha de montagem pode montar 20 carros por hora, que é o seu rendimento. Na computação, seria a quantidade de dados que pode ser transmitida em uma unidade de tempo. Portanto, uma conexão à Internet de 512 Mbps é uma medida da taxa de transferência – 512 MB por segundo.

Agora imagine o servidor web do freeCodeCamp. Se ele recebe 1 milhão de solicitações por segundo e pode atender apenas 800.000 solicitações, sua taxa de transferência é de 800.000 por segundo. Você pode acabar medindo a taxa de transferência em termos de bits em vez de solicitações, portanto seria N bits por segundo.

Neste exemplo, há um gargalo porque o servidor não pode enviar mais de N bits por segundo, mas as solicitações são mais que isso. Um gargalo é, portanto, a restrição em um sistema. Um sistema é tão rápido quanto seu gargalo mais lento.

Se um servidor pode lidar com 100 bits por segundo, e outro pode lidar com 120 bits por segundo e um terceiro pode lidar com apenas 50, o sistema geral estará operando a 50bps porque essa é a restrição – ele mantém a velocidade dos outros servidores em um determinado sistema.

Portanto, aumentar o rendimento em qualquer lugar que não seja o gargalo pode ser um desperdício – você pode querer apenas aumentar Taxa de transferência no menor gargalo primeiro.

Você pode aumentar a taxa de transferência comprando mais hardware (escala horizontal) ou aumentando a capacidade e o desempenho do seu hardware existente (escala vertical) ou algumas outras maneiras.

Às vezes, o aumento da taxa de transferência pode ser uma solução a curto prazo; portanto, um bom projetista de sistemas pensará nas melhores maneiras de dimensionar a taxa de transferência de um determinado sistema, inclusive dividindo solicitações (ou qualquer outra forma de “carga”) e distribuindo-as entre outros recursos etc. O ponto principal a ser lembrado é o que é a taxa de transferência, o que é uma restrição ou gargalo e como isso afeta um sistema.

Latência e taxa de transferência de fixação não são soluções isoladas e universais por si mesmas, nem são correlacionadas entre si. Eles têm impactos e considerações em todo o sistema, por isso é importante entender o sistema como um todo e a natureza das demandas que serão impostas ao sistema ao longo do tempo.

Seção 3: Disponibilidade do sistema

Os engenheiros de software pretendem criar sistemas confiáveis. Um sistema confiável é aquele que satisfaz consistentemente as necessidades de um usuário, sempre que ele busca satisfazer essa necessidade. Um componente chave dessa confiabilidade é a disponibilidade.

É útil pensar na disponibilidade como a resiliência de um sistema. Se um sistema é robusto o suficiente para lidar com falhas na rede, banco de dados, servidores etc., geralmente pode ser considerado um sistema tolerante a falhas – o que o torna um sistema disponível.

Obviamente, um sistema é uma soma de suas partes em muitos sentidos, e cada parte precisa ser altamente disponível se a disponibilidade for relevante para a experiência do usuário final do site ou aplicativo.

Quantificando a disponibilidade

Para quantificar a disponibilidade de um sistema, calculamos a porcentagem de tempo em que as principais funcionalidades e operações do sistema estão disponíveis (o tempo de atividade) em uma determinada janela de tempo.

Os sistemas mais críticos para os negócios precisariam ter uma disponibilidade quase perfeita. Os sistemas que suportam demandas e cargas altamente variáveis com altos e baixos acentuados podem conseguir uma disponibilidade um pouco menor durante os horários fora de pico.

Tudo depende do uso e da natureza do sistema. Mas, em geral, mesmo as coisas com demandas baixas, mas consistentes, ou uma garantia implícita de que o sistema está “sob demanda” precisariam ter alta disponibilidade.

Pense em um site onde você faz backup de suas fotos. Você nem sempre precisa acessar e recuperar dados a partir disso – é principalmente para você guardar as coisas. Você ainda espera que esteja sempre disponível sempre que fizer login para baixar apenas uma única imagem.

Um tipo diferente de disponibilidade pode ser entendido no contexto dos grandes dias de compras de comércio eletrônico, como as vendas de Black Friday ou Cyber Monday. Nestes dias em particular, a demanda dispara e milhões tentam acessar os negócios simultaneamente. Isso exigiria um projeto de sistema extremamente confiável e de alta disponibilidade para suportar essas cargas.

Um motivo comercial para alta disponibilidade é simplesmente que qualquer tempo de inatividade no site resultará na perda de dinheiro do site. Ademais, pode ser muito ruim para a reputação, por exemplo, onde o serviço é um serviço usado por de outros empresas a oferecer serviços. Se o AWS S3 for desativado, muitas empresas sofrerão, incluindo o Netflix, e isso é não é bom.

Portanto, o tempo de atividade é extremamente importante para o sucesso. Vale lembrar que esses tempos de atividade são calculados com base no dia inteiro (todas as 24 horas), portanto, um tempo de inatividade de “apenas 5%” significa 1,2 horas por dia. Que são 36 horas por mês. Isso é muito, porque isso pode acontecer continuamente – 1,5 dias consecutivos. Não aceitável!

Portanto, os tempos de atividade são tipicamente extremamente altos. É comum ver coisas como 99,99% de tempo de atividade. É por isso que agora é comum se referir a tempos de atividade em termos de “noves” – o número de noves na garantia de tempo de atividade.

No caso de 99,99% de tempo de atividade, são 4 noves. Parece bom, mas mais de um ano leva mais de 3,5 dias para baixo.

No mundo de hoje, isso é inaceitável para serviços de larga escala ou de missão crítica. É por isso que hoje em dia “cinco noves” é considerado o padrão de disponibilidade ideal, pois isso se traduz em pouco mais de 5 minutos de tempo de inatividade por ano.

SLAs

Para tornar os serviços on-line competitivos e atender às expectativas do mercado, os provedores de serviços on-line geralmente oferecem Contratos / Garantias de Nível de Serviço. Estes são um conjunto de métricas de nível de serviço garantidas. O tempo de atividade de 99,999% é uma dessas métricas e geralmente é oferecida como parte das assinaturas premium.

No caso de provedores de serviços de banco de dados e nuvem, isso pode ser oferecido mesmo nas versões de avaliação ou gratuita, se o uso principal de um cliente para esse produto justificar a expectativa de tal métrica.

Em muitos casos, o não cumprimento do SLA dará ao cliente o direito a créditos ou alguma outra forma de compensação pela falha do fornecedor em cumprir essa garantia. Aqui, a título de exemplo, é SLA do Google para a API do Google Maps.

Portanto, os SLAs são uma parte crítica da consideração comercial e técnica geral ao projetar um sistema. É especialmente importante considerar se a disponibilidade é de fato um requisito essencial para uma parte de um sistema e quais partes exigem alta disponibilidade.

Projetando HA

Ao projetar um sistema de alta disponibilidade (HA), é necessário reduzir ou eliminar “pontos únicos de falha”. Um único ponto de falha é um elemento no sistema que é o único elemento que pode produzir essa perda indesejável de disponibilidade.

Você elimina pontos únicos de falha projetando ‘redundância’ no sistema. A redundância é basicamente fazer 1 ou mais alternativas (ou seja, backups) ao elemento crítico para alta disponibilidade.

Portanto, se seu aplicativo precisar que os usuários sejam autenticados para usá-lo, e houver apenas um serviço de autenticação e back-end, e isso falhar, então, porque esse é o único ponto de falha, seu sistema não será mais utilizável. Por ter dois ou mais serviços que podem lidar com autenticação, você adicionou redundância e eliminou (ou reduziu) pontos únicos de falha.

Portanto, você precisa entender e decompor seu sistema em todas as suas partes. Mapeie quais podem causar pontos únicos de falha, quais não são tolerantes a essa falha e quais partes podem tolerá-los. Como a HA de engenharia requer compensações e algumas dessas compensações podem ser caras em termos de tempo, dinheiro e recursos.

Seção 4: Armazenamento em Cache

Armazenamento em cache! Essa é uma técnica muito fundamental e fácil de entender para acelerar o desempenho em um sistema. Assim, o cache ajuda a reduzir “latência” em um sistema.

Em nossas vidas diárias, usamos o cache como uma questão de bom senso (na maioria das vezes …). Se morarmos ao lado de um supermercado, ainda queremos comprar e armazenar algumas coisas básicas em nossa geladeira e nosso armário de comida. Isso é cache. Sempre podíamos sair, ir para a porta ao lado e comprar essas coisas toda vez que queremos comida – mas, se estiver na despensa ou na geladeira, reduzimos o tempo necessário para fazer a comida. Isso é cache.

Cenários comuns para armazenamento em cache

Da mesma forma, em termos de software, se acabarmos confiando em determinadas partes de dados com frequência, convém armazenar em cache esses dados para que nosso aplicativo tenha um desempenho mais rápido.

Isso geralmente acontece quando é mais rápido recuperar dados da memória do que disco devido à latência em fazer solicitações de rede. De fato, muitos sites são armazenados em cache (especialmente se o conteúdo não muda frequentemente) em CDNs para que ele possa ser exibido ao usuário final com muito mais rapidez e reduz a carga nos servidores de back-end.

Outro contexto em que o cache ajuda pode ser o local em que o back-end deve realizar algum trabalho computacionalmente intensivo e demorado. Armazenar em cache resultados anteriores que convertem o tempo de pesquisa de um tempo O (Log N) linear para um tempo O (1) constante pode ser muito vantajoso.

Da mesma forma, se o servidor precisar fazer várias solicitações de rede e chamadas de API para compor os dados que são enviados de volta ao solicitante, os dados em cache podem reduzir o número de chamadas de rede e, portanto, a latência.

Se o seu sistema tiver um cliente (front-end) e um servidor e bancos de dados (back-end), o armazenamento em cache poderá ser inserido no cliente (por exemplo, armazenamento do navegador), entre o cliente e o servidor (por exemplo, CDNs) ou no próprio servidor. Isso reduziria as chamadas pela rede ao banco de dados.

Portanto, o armazenamento em cache pode ocorrer em vários pontos ou níveis do sistema, inclusive no nível de hardware (CPU).

Manipulando Dados Antigos

Você deve ter notado que os exemplos acima são implicitamente úteis para operações de “leitura”. As operações de gravação não são tão diferentes, nos princípios principais, com as seguintes considerações adicionais:

As operações de gravação exigem manter o cache e o banco de dados sincronizados
isso pode aumentar a complexidade, porque há mais operações a serem executadas e novas considerações sobre como lidar com dados não sincronizados ou “obsoletos” precisam ser analisadas cuidadosamente
Talvez seja necessário implementar novos princípios de design para lidar com essa sincronização – isso deve ser feito de forma síncrona ou assíncrona? Se assíncrono, em que intervalos? De onde os dados são veiculados nesse meio tempo? Com que frequência o cache precisa ser atualizado, etc …
“despejo” de dados ou rotatividade e atualização de dados, para manter os dados armazenados em cache atualizados e atualizados. Isso inclui técnicas como LIFO, FIFO, LRU e LFU.

Então, vamos terminar com algumas conclusões de alto nível e não vinculativas. Geralmente, o armazenamento em cache funciona melhor quando usado para armazenar dados estáticos ou com pouca frequência, e quando as fontes de mudança provavelmente são operações únicas, em vez de operações geradas pelo usuário.

Onde a consistência e a atualização dos dados são críticas, o armazenamento em cache pode não ser uma solução ideal, a menos que haja outro elemento no sistema que atualize com eficiência os caches, sejam intervalos que não afetem adversamente o objetivo e a experiência do usuário do aplicativo.

Seção 5: Proxies

Proxy. O que? Muitos de nós já ouvimos falar de servidores proxy. Podemos ter visto opções de configuração em alguns de nossos softwares para PC ou Mac que falam sobre adicionar e configurar servidores proxy ou acessar “via proxy”.

Então, vamos entender essa peça de tecnologia relativamente simples, amplamente usada e importante. Esta é uma palavra que existe no idioma inglês completamente independente da ciência da computação, então vamos começar com isso definição.

Fonte: https://www.merriam-webster.com/dictionary/proxy

Agora você pode expulsar a maior parte disso da sua mente e se apegar a uma palavra-chave: “substituto”.

Na computação, um proxy geralmente é um servidor e é um servidor que atua como intermediário entre um cliente e outro servidor. Literalmente, é um pouco de código que fica entre o cliente e o servidor. Esse é o cerne dos proxies.

Caso você precise de uma atualização ou não tenha certeza das definições de cliente e servidor, um “cliente” é um processo (código) ou máquina que solicita dados de outro processo ou máquina (o “servidor”). O navegador é um cliente quando solicita dados de um servidor back-end.

O servidor atende o cliente, mas também pode ser um cliente – quando ele recupera dados de um banco de dados. Então o banco de dados é o servidor, o servidor é o cliente (do banco de dados) e Ademais um servidor para o cliente front-end (navegador).

Fonte: https://teoriadeisegnali.it/appint/html/altro/bgnet/clientserver.html#figure2

Como você pode ver acima, o relacionamento cliente-servidor é bidirecional. Então, uma coisa pode ser tanto o cliente quanto o servidor. Se houvesse um servidor intermediário que recebesse solicitações, enviasse-as para outro serviço e encaminharia a resposta recebida desse outro serviço de volta ao cliente de origem, que seria um servidor proxy.

No futuro, nos referiremos a clientes como clientes, servidores como servidores e proxies como a coisa entre eles.

Portanto, quando um cliente envia uma solicitação a um servidor via proxy, às vezes o proxy pode mascarar a identidade do cliente – para o servidor, o endereço IP que aparece na solicitação pode ser o proxy e não o cliente de origem.

Para aqueles que acessam sites ou baixam coisas restritas (da rede torrent, por exemplo, ou sites banidos em seu país), você deve reconhecer esse padrão – é o princípio no qual as VPNs são construídas.

Antes de nos aprofundarmos um pouco mais, quero destacar algo – quando geralmente usado, o termo proxy se refere a um proxy “forward”. Um proxy de encaminhamento é aquele em que o proxy atua em nome do (substituindo) o cliente na interação entre cliente e servidor.

Isso se distingue de um proxy reverso – onde o proxy atua em nome de um servidor. Em um diagrama, pareceria o mesmo – o proxy fica entre o cliente e o servidor e os fluxos de dados são o mesmo cliente <-> procuração <-> servidor.

A principal diferença é que um proxy reverso é designado como substituto para o servidor. Freqüentemente, os clientes nem sabem que a solicitação de rede foi roteada através de um proxy, que foi transmitida ao servidor pretendido (e fez a mesma coisa com a resposta do servidor).

Portanto, em um proxy de encaminhamento, o servidor não saberá que a solicitação e a resposta do cliente estão trafegando por um proxy e, em um proxy reverso, o cliente não saberá que a solicitação e a resposta são roteadas por um proxy.

Os proxies parecem meio sorrateiros 🙂

Mas no design de sistemas, especialmente para sistemas complexos, os proxies são úteis e os proxies reversos são particularmente úteis. Ao seu proxy reverso podem ser delegadas muitas tarefas que você não deseja que o servidor principal lide – ele pode ser um gatekeeper, um screener, um balanceador de carga e um assistente geral.

Portanto, os proxies podem ser úteis, mas você pode não ter certeza do motivo. Mais uma vez, se você leu minhas outras coisas, saberia que acredito firmemente que só pode entender as coisas corretamente quando sabe porque eles existem – sabendo o que eles fazem não é suficiente.

Falamos sobre VPNs (para proxies de encaminhamento) e balanceamento de carga (para proxies reversos), mas há mais exemplos aqui – Eu particularmente recomendo o resumo de alto nível de Clara Clarkson.

Seção 6: Balanceamento de carga

Se você pensar nas duas palavras carga e equilíbrio, começará a ter uma intuição sobre o que isso faz no mundo da computação. Quando um servidor recebe muitas solicitações simultaneamente, pode diminuir (a taxa de transferência reduz, a latência aumenta). Depois de um ponto, pode até falhar (sem disponibilidade).

Você pode dar ao servidor mais força muscular (escala vertical) ou adicionar mais servidores (escala horizontal). Mas agora você precisa descobrir como as solicitações de renda são distribuídas aos vários servidores – quais solicitações são roteadas para quais servidores e como garantir que elas também não sejam sobrecarregadas? Em outras palavras, como você equilibra e aloca a carga da solicitação?

Digite balanceadores de carga. Como este artigo é uma introdução aos princípios e conceitos, são, necessariamente, explicações muito simplificadas. O trabalho de um balanceador de carga é ficar entre o cliente e o servidor (mas há outros locais em que ele pode ser inserido) e descobrir como distribuir as cargas de solicitação recebidas entre vários servidores, para que a experiência do usuário final (cliente) seja consistentemente rápida, suave e confiável.

Portanto, os balanceadores de carga são como gerenciadores de tráfego que direcionam o tráfego. E eles fazem isso para manter disponibilidade e Taxa de transferência.

Ao entender onde um balanceador de carga é inserido na arquitetura do sistema, você pode ver que os balanceadores de carga podem ser considerados como proxies reversos. Mas um balanceador de carga também pode ser inserido em outros locais – entre outras trocas – por exemplo, entre o servidor e o banco de dados.

O ato de equilíbrio – Estratégias de seleção de servidores

Então, como o balanceador de carga decide como rotear e alocar o tráfego de solicitações? Para começar, toda vez que você adiciona um servidor, você precisa informar ao seu balanceador de carga que há mais um candidato para o qual direcionar o tráfego.

Se você remover um servidor, o balanceador de carga também precisará saber disso. A configuração garante que o balanceador de carga saiba quantos servidores possui em sua lista de acesso e quais estão disponíveis. É ainda possível que o balanceador de carga seja mantido informado sobre os níveis de carga de cada servidor, status, disponibilidade, tarefa atual e assim por diante.

Depois que o balanceador de carga estiver configurado para saber para quais servidores ele pode redirecionar, precisamos elaborar a melhor estratégia de roteamento para garantir que haja uma distribuição adequada entre os servidores disponíveis.

Uma abordagem ingênua disso é que o balanceador de carga escolha aleatoriamente um servidor e direcione cada solicitação recebida dessa maneira. Mas, como você pode imaginar, a aleatoriedade pode causar problemas e alocações “desequilibradas”, onde alguns servidores ficam mais carregados que outros e isso pode afetar negativamente o desempenho do sistema geral.

Robin Redondo e Robin Redondo Ponderado

Outro método que pode ser intuitivamente entendido é chamado de “round robin”. É assim que muitos humanos processam a lista desse loop. Você começa no primeiro item da lista, desce em seqüência e, quando termina o último item, volta ao topo e começa a trabalhar na lista novamente.

O balanceador de carga também pode fazer isso, percorrendo os servidores disponíveis em uma sequência fixa. Dessa forma, a carga é distribuída de maneira bastante uniforme entre os servidores em um padrão simples de entender e previsível.

Você pode ficar um pouco mais “chique” com o round robin “ponderando” alguns serviços em detrimento de outros. No rodízio normal e padrão, cada servidor recebe o mesmo peso (digamos que todos recebem 1). Mas quando você pesa servidores de maneira diferente, pode ter alguns servidores com um peso menor (por exemplo, 0,5, se eles são menos poderosos) e outros podem ser mais altos, como 0,7 ou 0,9 ou até 1.

Em seguida, o tráfego total será dividido na proporção desses pesos e alocado de acordo com os servidores com energia proporcional ao volume de solicitações.

Seleção de servidor com base em carga

More sophisticated load balancers can work out the current capacity, performance, and loads of the servers in their go-to list and allocate dynamically according to current loads and calculations as to which will have the highest throughput, lowest latency etc. It would do this by monitoring the performance of each server and deciding which ones can and cannot handle the new requests.

IP Hashing based selection

You can configure your load balancer to hash the IP address of incoming requests, and use the hash value to determine which server to direct the request too. If I had 5 servers available, then the hash function would be designed to return one of five hash values, so one of the servers definitely gets nominated to process the request.

IP hash based routing can be very useful where you want requests from a certain country or region to get data from a server that is best suited to address the needs from within that region, or where your servers cache requests so that they can be processed fast.

In the latter scenario, you want to ensure that the request goes to a server that has previously cached the same request, as this will improve speed and performance in processing and responding to that request.

If your servers each maintain independent caches and your load balancer does not consistently send identical requests to the same server, you will end up with servers re-doing work that has already been done in as previous request to another server, and you lose the optimization that goes with caching data.

Path or Service based selection

You can also get the load balancer to route requests based on their “path” or function or service that is being provided. For example if you’re buying flowers from an online florist, requests to load the “Bouquets on Special” may be sent to one server and credit card payments may be sent to another server.

If only one in twenty visitors actually bought flowers, then you could have a smaller server processing the payments and a bigger one handling all the browsing traffic.

Mixed Bag

And as with all things, you can get to higher and more detailed levels of complexity. You can have multiple load balancers that each have different server selection strategies! And if yours is a very large and highly trafficked system, then you may need load balancers for load balancers…

Ultimately, you add pieces to the system until your performance is tuned to your needs (your needs may look flat, or slow upwards mildly over time, or be prone to spikes!).

Section 7: Consistent Hashing

One of the slightly more tricky concepts to understand is hashing in the context of load balancing. So it gets its own section.

In order to understand this, please first understand how hashing works at a conceptual level. The TL;DR is that hashing converts an input into a fixed-size value, often an integer value (the hash).

One of the key principles for a good hashing algorithm or function is that the function must be deterministic, which is a fancy way for saying that identical inputs will generate identical outputs when passed into the function. So, deterministic means – if I pass in the string “Code” (case sensitive) and the function generates a hash of 11002, then every time I pass in “Code” it must generate “11002” as an integer. And if I pass in “code” it will generate a different number (consistently).

Sometimes the hashing function can generate the same hash for more than one input – this is not the end of the world and there are ways to deal with it. In fact it becomes more likely the more the range of unique inputs are. But when more than one input deterministically generates the same output, it’s called a “collision”.

With this in firmly in mind, let’s apply it to routing and directed requests to servers. Let’s say you have 5 servers to allocate loads across. An easy to understand method would be to hash incoming requests (maybe by IP address, or some client detail), and then generate hashes for each request. Then you apply the modulo operator to that hash, where the right operand is the number of servers.

For example, this is what your load balancers’ pseudo code could look like:


request#1 => hashes to 34
request#2 => hashes to 23
request#3 => hashes to 30
request#4 => hashes to 14

// You have 5 servers => [Server A, Server B ,Server C ,Server D ,Server E]

// so modulo 5 for each request...

request#1 => hashes to 34 => 34 % 5 = 4 => send this request to servers[4] => Server E

request#2 => hashes to 23 => 23 % 5 = 3 => send this request to servers[3] => Server D

request#3 => hashes to 30 => 30 % 5 = 0 => send this request to  servers[0] => Server A

request#4 => hashes to 14 => 14 % 5 = 4 => send this request to servers[4] => Server E

As you can see, the hashing function generates a spread of possible values, and when the modulo operator is applied it brings out a smaller range of numbers that map to the server number.

You will definitely get different requests that map to the same server, and that’s fine, as long as there is “uniformity” in the overall allocation to all the servers.

Adding Servers, and Handling Failing Servers

So – what happens if one of the servers that we are sending traffic to dies? The hashing function (refer to the pseudo code snippet above) still thinks there are 5 servers, and the mod operator generates a range from 0-4. But we only have 4 servers now that one has failed, and we are still sending it traffic. Oops.

Inversely, we could add a sixth server but that would Nunca get any traffic because our mod operator is 5, and it will never yield a number that would include the newly added 6th server. Double oops.

// Let's add a 6th server
servers => [Server A, Server B ,Server C ,Server D ,Server E, Server F]

// let's change the modulo operand to 6
request#1 => hashes to 34 => 34 % 6 = 4 => send this request to servers[4] => Server E

request#2 => hashes to 23 => 23 % 6 = 5 => send this request to servers[5] => Server F

request#3 => hashes to 30 => 30 % 6 = 0 => send this request to  servers[0] => Server A

request#4 => hashes to 14 => 14 % 6 = 2 => send this request to servers[2] => Server C

We note that the server number after applying the mod changes (though, in this example, não for request#1 and request#3 – but that is just because in this specific case the numbers worked out that way).

In effect, the result is that half the requests (could be more in other examples!) are now being routed to new servers altogether, and we lose the benefits of previously cached data on the servers.

For example, request#4 used to go to Server E, but now goes to Server C. All the cached data relating to request#4 sitting on Server E is of no use since the request is now going to Server C. You can calculate a similar problem for where one of your servers dies, but the mod function keeps sending it requests.

It sounds minor in this tiny system. But on a very large scale system this is a poor outcome. #SystemDesignFail.

So clearly, a simple hashing-to-allocate system does not scale or handle failures well.

A popular solution – consistent hashing

Unfortunately this is the part where I feel word descriptions will not be enough. Consistent hashing is best understood visually. But the purpose of this post so far is to give you an intuition around the problem, what it is, why it arises, and what the shortcomings in a basic solution might be. Keep that firmly in mind.

The key problem with naive hashing, as we discussed, is that when (A) a server fails, traffic still gets routed to it, and (B) you add a new server, the allocations can get substantially changed, thus losing the benefits of previous caches.

There are two very important things to keep in mind when digging into consistent hashing:

Consistent hashing does not eliminate the problems, especially B. But it does reduce the problems a lot. At first you might wonder what the big deal is in consistent hashing, as the underlying downside still exists – yes, but to a much smaller extent, and that itself is a valuable improvement in very large scale systems.
Consistent hashing applies a hash function to incoming requests and the servers. The resulting outputs therefore fall in a set range (continuum) of values. This detail is very important.

Please keep these in mind as you watch the below recommended video that explains consistent hashing, as otherwise its benefits may not be obvious.

I strongly recommend this video as it embeds these principles without burdening you with too much detail.

https://www.youtube.com/watch?v=tHEyzVbl4bg

A Brief Introduction to Consistent Hashing (https://www.youtube.com/watch?v=tHEyzVbl4bg)

A brief intro to consistent hashing by Hannah Barton

If you’re having a little trouble really understanding why this strategy is important in load balancing, I suggest you take a break, then return to the load balancing section and then re-read this again. It’s not uncommon for all this to feel very abstract unless you’ve directly encountered the problem in your work!

Section 8: Databases

We briefly considered that there are different types of storage solutions (databases) designed to suit a number of different use-cases, and some are more specialized for certain tasks than others. At a very high level though, databases can be categorized into two types: Relational and Non-Relational.

Relational Databases

UMA relational database is one that has strictly enforced relationships between things stored in the database. These relationships are typically made possible by requiring the database to represented each such thing (called the “entity”) as a structured table – with zero or more rows (“records”, “entries”) and and one or more columns (“attributes, “fields”).

By forcing such a structure on an entity, we can ensure that each item/entry/record has the right data to go with it. It makes for better consistency and the ability to make tight relationships between the entities.

You can see this structure in the table recording “Baby” (entity) data below. Each record (“entry) in the table has 4 fields, which represent data relating to that baby. This is a classic relational database structure (and a formalized entity structure is called a schema)

source: https://web.stanford.edu/class/cs101/table-1-data.html

So the key feature to understand about relational databases is that they are highly structured, and impose structure on all the entities. This structure in enforced by ensuring that data added to the table conforms to that structure. Adding a height field to the table when its schema doesn’t allow for it will not be permitted.

Most relational databases support a database querying language called SQL – Structured Query Language. This is a language specifically designed to interact with the contents of a structured (relational) database. The two concepts are quite tightly coupled, so much so that people often referred to a relational database as a “SQL database” (and sometimes pronounced as “sequel” database).

In general, it is considered that SQL (relational) databases support more complex queries (combining different fields and filters and conditions) than non-relational databases. The database itself handles these queries and sends back matching results.

Many people who are SQL database fans argue that without that function, you would have to fetch tudo the data and then have the server or the client load that data “in memory” and apply the filtering conditions – which is OK for small sets of data but for a large, complex dataset, with millions of records and rows, that would badly affect performance. However, this is not always the case, as we will see when we learn about NoSQL databases.

A common and much-loved example of a relational database is the PostgreSQL (often called “Postgres”) database.

ACID

ACID transactions are a set of features that describe the transactions that a good relational database will support. ACID = “Atomic, Consistent, Isolation, Durable”. A transaction is an interaction with a database, typically read or write operations.

Atomicity requires that when a single transaction comprises of more than one operation, then the database must guarantee that if one operation fails the entire transaction (all operations) also fail. It’s “all or nothing”. That way if the transaction succeeds, then on completion you know that all the sub-operations completed successfully, and if an operation fails, then you know that all the operations that went with it failed.

For example if a single transaction involved reading from two tables and writing to three, then if any one of those individual operations fails the entire transaction fails. This means that none of those individual operations should complete. You would not want even 1 out of the 3 write transactions to work – that would “dirty” the data in your databases!

Consistency requires that each transaction in a database is valid according to the database’s defined rules, and when the database changes state (some information has changed), such change is valid and does not corrupt the data. Each transaction moves the database from one valid state to another valid state.

Isolation means that you can “concurrently” (at the same time) run multiple transactions on a database, but the database will end up with a state that looks as though each operation had been run serially ( in a sequence, like a queue of operations). I personally think “Isolation” is not a very descriptive term for the concept, but I guess ACCD is less easy to say than ACID…

Durabilidade is the promise that once the data is stored in the database, it will remain so. It will be “persistent” – stored on disk and not in “memory”.

Non-relational databases

In contrast, a non-relational database has a less rigid, or, put another way, a more flexible structure to its data. The data typically is presented as “key-value” pairs. A simple way of representing this would be as an array (list) of “key-value” pair objects, for example:

// baby names
[
	{ 
    	name: "Jacob",
        rank: ##,
        gender: "M",
        year: ####
    },
    { 
    	name: "Isabella",
        rank: ##,
        gender: "F",
        year: ####
    },
    {
      //...
    },
    
    // ...
]

Non relational databases are also referred to as “NoSQL” databases, and offer benefits when you do not want or need to have consistently structured data.

Similar to the ACID properties, NoSQL database properties are sometimes referred to as BASE:

Basically Available which states that the system guarantees availability

Soft State mean means the state of the system may change over time, even without input

Eventual Consistency states that the system will become consistent over a period of time unless other inputs are received

Since, at their core, these databases hold data in a hash-table-like structure, they are extremely fast, simple and easy to use, and are perfect for use cases like caching, environment variables, configuration files and session state etc. This flexibility makes them perfect for using in memory (e.g. Memcached) and also in persistent storage (e.g. DynamoDb)

There are other “JSON-like” databases called document databases like the well-loved MongoDb, and at the core these are also “key-value” stores.

Database Indexing

This is a complicated topic so I will simply skim the surface for the purpose of giving you a high level overview of what you need for systems design interviews.

Imagine a database table with 100 million rows. This table is used mainly to look up one or two values in each record. To retrieve the values for a specific row you would need to iterate over the table. If it’s the very last record that would take a long time!

Indexing is a way of short cutting to the record that has matching values more efficiently than going through each row. Indexes are typically a data structure that is added to the database that is designed to facilitate fast searching of the database for those específico attributes (fields).

So if the census bureau has 120 million records with names and ages, and you most often need to retrieve lists of people belonging to an age group, then you would index that database on the age attribute.

Indexing is core to relational databases and is also widely offered on non-relational databases. The benefits of indexing are thus available in theory for both types of databases, and this is hugely beneficial to optimise lookup times.

Replication and Sharding

While these may sound like things out of a bio-terrorism movie, you’re more likely to hear them everyday in the context of database scaling.

Replication means to duplicate (make copies of, replicate) your database. You may remember that when we discussed availability.

We had considered the benefits of having redundancy in a system to maintain high availability. Replication ensures redundancy in the database if one goes down. But it also raises the question of how to synchronize data across the replicas, since they’re meant to have the same data. Replication on write and update operations to a database can happen synchronously (at the same time as the changes to the main database) or asynchronously .

The acceptable time interval between synchronising the main and a replica database really depends on your needs – if you really need state between the two databases to be consistent then the replication needs to be rapid. You also want to ensure that if the write operation to the replica fails, the write operation to the main database also fails (atomicity).

But what do you do when you’ve got so much data that simply replicating it may solve availability issues but does not solve throughput and latency issues (speed)?

At this point you may want to consider “chunking down” your data, into “shards”. Some people also call this partitioning your data (which is different from partitioning your hard drive!).

Sharding data breaks your huge database into smaller databases. You can work out how you want to shard your data depending on its structure. It could be as simple as every 5 million rows are saved in a different shard, or go for other strategies that best fit your data, needs and locations served.

Section 9: Leader Election

Let’s move back to servers again for a slightly more advanced topic. We already understand the principle of Availability, and how redundancy is one way to increase availability. We have also walked through some practical considerations when handling the routing of requests to clusters of redundant servers.

But sometimes, with this kind of setup where multiple servers are doing much the same thing, there can arise situations where you need only one server to take the lead.

For example, you want to ensure that only one server is given the responsibility for updating some third party API because multiple updates from different servers could cause issues or run up costs on the third-party’s side.

In this case you need to choose that primary server to delegate this update responsibility to. That process is called leader election.

When multiple servers are in a cluster to provide redundancy, they could, amongst themselves, be configured to have one and only one leader. They would also detect when that leader server has failed, and appoint another one to take its place.

The principle is very simple, but the devil is in the details. The really tricky part is ensuring that the servers are “in sync” in terms of their data, state and operations.

There is always the risk that certain outages could result in one or two servers being disconnected from the others, for example. In that case, engineers end up using some of the underlying ideas that are used in blockchain to derive consensus values for the cluster of servers.

In other words, a consensus algorithm is used to give all the servers an “agreed on” value that they can all rely on in their logic when identifying which server is the leader.

Leader Election is commonly implemented with software like etcd, which is a store of key-value pairs that offers both high availability e strong consistency (which is valuable and an unusual combination) by using Leader Election itself and using a consensus algorithm.

So engineers can rely on etcd’s own leader election architecture to produce leader election in their systems. This is done by storing in a service like etcd, a key-value pair that represents the current leader.

Since etcd is highly available e strongly consistent, that key-value pair can always be relied on by your system to contain the final “source of truth” server in your cluster is the current elected leader.

Section 10: Polling, Streaming, Sockets

In the modern age of continuous updates, push notifications, streaming content and real-time data, it is important to grasp the basic principles that underpin these technologies. To have data in your application updated regularly or instantly requires the use of one of the two following approaches.

Polling

This one is simple. If you look at the wikipedia entry you may find it a bit intense. So instead take a look at its dictionary meaning, especially in the context of computer science. Keep that simple fundamental in mind.

Polling is simply having your client “check” send a network request to your server and asking for updated data. These requests are typically made at regular intervals like 5 seconds, 15 seconds, 1 minute or any other interval required by your use case.

Polling every few seconds is still not quite the same as real-time, and also comes with the following downsides, especially if you have a million plus simultaneous users:

almost-constant network requests (not great for the client)
almost constant inbound requests (not great for the server loads – 1 million+ requests per second!)

So polling rapidly is not really efficient or performant, and polling is best used in circumstances when small gaps in data updates is not a problem for your application.

For example, if you built an Uber clone, you may have the driver-side app send driver location data every 5 seconds, and your rider-side app poll for the driver’s location every 5 seconds.

Transmissão

Streaming solves the constant polling problem. If constantly hitting the server is necessary, then it’s better to use something called web-sockets.

This is a network communication protocol that is designed to work over TCP. It opens a two-way dedicated channel (socket) between a client and server, kind of like an open hotline between two endpoints.

Unlike the usual TCP/IP communication, these sockets are “long-lived” so that its a single request to the server that opens up this hotline for the two-way transfer of data, rather than multiple separate requests. By long-lived, we meant that the socket connection between the machines will last until either side closes it, or the network drops.

You may remember from our discussion on IP, TCP and HTTP that these operate by sending “packets” of data, for each request-response cycle. Web-sockets mean that there is a single request-response interaction (not a cycle really if you think about it!) and that opens up the channel through which two-data is sent in a “stream”.

The big difference with polling and all “regular” IP based communication is that whereas polling has the client making requests to the server for data at regular intervals (“pulling” data), in streaming, the client is “on standby” waiting for the server to “push” some data its way. The server will send out data when it changes, and the client is always listening for that. Hence, if the data change is constant, then it becomes a “stream”, which may be better for what the user needs.

For example, while using collaborative coding IDEs, when either user types something, it can show up on the other, and this is done via web-sockets because you want to have real-time collaboration. It would suck if what I typed showed up on your screen after you tried to type the same thing or after 3 minutes of you waiting wondering what I was doing!

Or think of online, multiplayer games – that is a perfect use case for streaming game data between players!

To conclude, the use case determines the choice between polling and streaming. In general, you want to stream if your data is “real-time”, and if it’s OK to have a lag (as little as 15 seconds is still a lag) then polling may be a good option. But it all depends on how many simultaneous users you have and whether they expect the data to be instantaneous. A commonly used example of a streaming service is Apache Kafka.

Section 11: Endpoint Protection

When you build large scale systems it becomes important to protect your system from too many operations, where such operations are not actually needed to use the system. Now that sounds very abstract. But think of this – how many times have you clicked furiously on a button thinking it’s going to make the system more responsive? Imagine if each one of those button clicks pinged a server and the server tried to process them all! If the throughput of the system is low for some reason (say a server was struggling under unusual load) then each of those clicks would have made the system even slower because it has to process them all!

Sometimes it’s not even about protecting the system. Sometimes you want to limit the operations because that is part of your service. For example, you may have used free tiers on third-party API services where you’re only allowed to make 20 requests per 30 minute interval. if you make 21 or 300 requests in a 30 minute interval, after the first 20, that server will stop processing your requests.

That is called rate-limiting. Using rate-limiting, a server can limit the number of operations attempted by a client in a given window of time. A rate-limit can be calculated on users, requests, times, payloads, or other things. Typically, once the limit is exceeded in a time window, for the rest of that window the server will return an error.

Ok, now you might think that endpoint “protection” is an exaggeration. You’re just restricting the users ability to get something out of the endpoint. True, but it is also protection when the user (client) is malicious – like say a bot that is smashing your endpoint. Why would that happen? Because flooding a server with more requests than it can handle is a strategy used by malicious folks to bring down that server, which effectively brings down that service. That’s exactly what a Denial of Service (D0S) attack is.

While DoS attacks can be defended against in this way, rate-limiting by itself won’t protect you from a sophisticated version of a DoS attack – a distributed DoS. Here distribution simply means that the attack is coming from multiple clients that seem unrelated and there is no real way to identify them as being controlled by the single malicious agent. Other methods need to be used to protect against such coordinated, distributed attacks.

But rate-limiting is useful and popular anyway, for less scary use-cases, like the API restriction one I mentioned. Given how rate-limiting works, since the server has to first check the limit conditions and enforce them if necessary, you need to think about what kind of data structure and database you’d want to use to make those checks super fast, so that you don’t slow down processing the request if it’s within allowed limits. Also, if you have it in-memory within the server itself, then you need to be able to guarantee that all requests from a given client will come to that server so that it can enforce the limits properly. To handle situations like this it’s popular to use a separate Redis service that sits outside the server, but holds the user’s details in-memory, and can quickly determine whether a user is within their permitted limits.

Rate limiting can be made as complicated as the rules you want to enforce, but the above section should cover the fundamentals and most common use-cases.

Section 12: Smaller Essentials

Logging

Over time your system will collect a lot of data. Most of this data is extremely useful. It can give you a view of the health of your system, its performance and problems. It can also give you valuable insight into who uses your system, how they use it, how often, which parts get used more or less, and so on.

This data is valuable for analytics, performance optimization and product improvement. It is also extremely valuable for debugging, not just when you log to your console during development, but in actually hunting down bugs in your test and production environments. So logs help in traceability and audits too.

The key trick to remember when logging is to view it as a sequence of consecutive events, which means the data becomes time-series data, and the tools and databases you use should be specifically designed to help work with that kind of data.

Monitoramento

This is the next steps after logging. It answers the question of “What do I do with all that logging data?”. You monitor and analyze it. You build or use tools and services that parse through that data and present you with dashboards or charts or other ways of making sense of that data in a human-readable way.

By storing the data in a specialized database designed to handle this kind of data (time-series data) you can plug in other tools that are built with that data structure and intention in mind.

Alerting

When you are actively monitoring you should also put a system in place to alert you of significant events. Just like having an alert for stock prices going over a certain ceiling or below a certain threshold, certain metrics that you’re watching may warrant an alert being sent if they go too high or too low. Response times (latency) or errors and failures are good ones to set up alerting for if they go above an “acceptable” level.

The key to good logging and monitoring is to ensure your data is fairly consistent over time, as working with inconsistent data could result in missing fields that then break the analytical tools or reduce the benefits of the logging.

Recursos

As promised, some useful resources are as follows:

A fantastic Github repo full of concepts, diagrams and study prep
Tushar Roy’s introduction to Systems Design
Gaurav Sen’s YouTube playlist
SQL vs NoSQL

I hope you enjoyed this long-form post!

I can be contacted on Twitter: @ZubinPratap

Postscript fou freeCodeCamp students

I really, truly believe your most precious resources are your time, effort and money. Of these, the single most important resource is time, because the other two can be renewed and recovered. So if you’re going to spend time on something make sure it gets you closer to this goal.

With that in mind, if you want to invest 3 hours with me to find your shortest path to learning to code (especially if you’re a career changer, like me), then head to my course site and use the form there sign up (not the popup!). If you add the words “I LOVE CODE” to the message, I will know you’re a freeCodeCamp reader, and I will send you a promo code, because just like you, freeCodeCamp gave me a solid start.

Also if you would like to learn more, check out episode 53 do freeCodeCamp podcast, where Quincy (founder of FreeCodeCamp) and I share our experiences as career changers that may help you on your journey. You can also access the podcast on iTunes, Stitchere Spotify.