Glossary
Here you can find some key arguments of computer science that usually could be asked in interviews
Warning
This list is intended as a starting point to brush up on some topics and not as a complete resource that cover every aspect of a specific topic.
If you do not know any of the topics presented, please read more elsewhere.
API Gateway
separate business logic from common operations like:
Authentication, Authorization, DDoS protection, routing, serve static response, caching responses, load balancing, A/B testing
Authentication vs Authorization
In simple terms, authentication is the process of verifying who a user is, while authorization is the process of verifying what they have access to.
Automation (CI/CD) TODO
Availability
Is the time a system remains operational to perform its required function in a specific period.
Bloom Filter
A Bloom filter is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set.
The price paid for this efficiency is that a Bloom filter is a probabilistic data structure: it tells us that the element either definitely is not in the set or may be in the set.
CAP Theorem
CAP stands for: Consistency, Availability, Partition tolerance
This theorem states that any distributed system can only achieve 2 of these 3 properties.
Info
since almost all useful systems do have network-partition tolerance, it’s generally boiled down to Consistency VS Availability.
CDN
Stands for Content Delivery Network (CDN) refers to a geographically distributed group of servers which work together to provide fast delivery of Internet content.
Benefits:
- Improving website load times
- Reducing bandwidth costs
- Increasing content availability and redundancy
- Improving website security(DDoS mitigation)
Cache
Caching is the mechanism of storing data in a temporary storage location, usually in memory, so that requests to the data can be served faster.
Caching improves performance by decreasing page load times, and reduces the load on servers and databases.
Cache hit: return response without ask data to server or db
Cache miss: request sent to the server or db to fetch result. Result is than cached to the temporary storage and returned to the client
Cache invalidation methods
When data is updated in the database, then that data has to be refreshed in the cache as well. This is called cache invalidation.
Method | Description |
---|---|
Write-through cache |
Data is written to the cache and database at the same time. |
Write-around cache |
Data is written to the database writing to cache. Data is written to cache when a request results in a 'cache miss', at which point data is retrieved from the database, written to cache, and sent back to the client. |
Write-back (Write-behind) cache |
Data is written to the cache without writing to the database. Data is written to the database asynchronously. |
Eviction algorithms
First In First Out (FIFO) Last In First Out (LIFO) Least Recently Used (LRU) Least Frequently Used (LFU) Least Frequent Recently Used (LFRU)
Consistent Hashing
Is a hashing technique such that when a hash table (key map to the machine) is resized, only n/m keys need to be remapped.
Key | Description |
---|---|
N | number of keys |
M | number of slots |
DNS
The Domain Name System (DNS) is the phonebook of the Internet that maps domain names to IP addresses.
Database (TODO)
Which use? (SQL vs NoSQL)
Scaling (vertical vs horizontal/sharding)
Sharding:
Is the act of splitting a db into 2 or more pieces called shards.
Popular sharding strategies:
- based on client’s regions
- based on the type of data stored (user data in one shard; payments data stored in another)
- based on the hash of a column (only for structured data)
DB Replication
Backup
Vertical Scaling VS Horizontal Scaling
DoS
Is typically accomplished by flooding the targeted machine or resource with superfluous requests in an attempt to overload systems
Eventual Consistency
A consistency model which is unlike Strong Consistency.
In this model, reads might return a view of the system that is stale. An eventually consistent datastore will give guarantees that the state of the database will eventually reflect writes within a time period (seconds or minutes)
Gossip protocol
Is a communication protocol that allows state sharing in distributed systems.
he protocol enables each node to keep track of state information about the other nodes in the cluster, such as which nodes are reachable
HTTP
HTTP (Hypertext Transfer Protocol) is an (application layer) protocol designed to transfer information between networked devices.
A typical flow over HTTP involves a client machine making a request to a server, which then sends a response message.
HTTP Error Codes
Error Code | Description |
---|---|
1xx | indicates an informational message only |
2xx | indicates success of some kind |
3xx | redirects the client to another URL |
4xx | indicates an error on the client’s part |
5xx | indicates an error on the server’s part |
HTTP Methods
Method | Description |
---|---|
GET |
requests a representation of the specified resource. Requests using GET should only retrieve data. |
HEAD |
asks for a response identical to a GET request, but without the response body. |
POST |
submits an entity to the specified resource, often causing a change in state or side effects on the server. |
PUT |
submits an entity to the specified resource, often causing a change in state or side effects on the server. |
DELETE |
deletes the specified resource. |
CONNECT |
establishes a tunnel to the server identified by the target resource. |
OPTIONS |
describes the communication options for the target resource. |
TRACE |
performs a message loop-back test along the path to the target resource. |
PATCH |
applies partial modifications to a resource. |
HTTPS
Hypertext Transfer Protocol Secure (HTTPS) is the secure version of HTTP.
Technically speaking, HTTPS is not a separate protocol from HTTP.
It is simply using TLS/SSL encryption over the HTTP protocol.
HTTPS occurs based upon the transmission of TLS/SSL certificates, which verify that a particular provider is who they say they are.
JWT (TODO)
LoadBalancer
reduces individual server load and prevents application servers from becoming a single point of failure forwarding traffic only to “health” services.
It could use different algorithms to do so like Round Robin or load-aware balancers
Algorithms
Method | Description |
---|---|
Least Connection Method |
Routes request to the server having the least number of active connections. |
Least Response Time Method |
Routes request to the server having the least number of active connections and lowest average response time. |
Round Robin Method |
Routes request to the first available server and then moves it to the end of the queue. |
Weighted Round Robin Method |
Routes request to the first available server having the highest weight. Each server is assigned a weight, an integer number, based on its processing capacity. |
Message queue vs message broker
A message queue is a data structure, or a container - a way to hold messages for eventual consumption.
A message broker is a separate component that manages queues.
Brokers: Kafka, RabbitMQ, SNS/SQS
Monitoring (TODO)
Logging (TODO)
Levels: Debug -> Info -> Warn -> Error
Metrics (TODO)
Tracing
The goal of tracing is to follow a program’s flow and data progression.
Tracing allows you to see how you got there: which function, the function’s duration, parameters passed, and how deep into the function the user could get.
Pulling vs streaming (websockets)
Short polling
client continuously makes call to the server to retrieve data
Pros | Cons |
---|---|
you can have a stateless server | tons of requests to the server |
Long polling
client makes the request to the server.
The server can respond with the data requested or, if not ready/presents, it will keep open the connection to respond to the client when ready.
Pros | Cons |
---|---|
not overload server with requests | you tie up a connection between client and server |
Web sockets (TCP)
open a connection between client and server and keep it open in both ways
Pros | Cons |
---|---|
server is in control to when send data to the client | a connections is opened between them the all time |
REST
REST is an architectural style, or design pattern, for APIs.
REST stands for REpresentational State Transfer. It means when a RESTful API is called, the server will transfer to the client a representation of the state of the requested resource.
A resource can be any object the API can provide information about
Rate limiting
Is used to control the rate of requests sent or received.
It can be used to prevent DoS attacks, or limit api usage.
Redundancy
Consists in a duplication of components of a system that tries to increase the reliability of this system.
Reliability
Is the probability a system will fail in a given period. A distributed system is considered reliable if it keeps delivering its services even when one or several of its software or hardware components fail.
redundancy has a cost to achieve such resilience for services by eliminating every single point of failure.
SLA - Service-Level Agreement
The agreement you make with your users.
Typically make guarantees on a system’s availability.
SLO - Service-Level Objective
The objectives your team must hit to meet that agreement.
Usually about specific metrics like uptime or response time.
Scalability
is the capability of a system, process, or a network to grow and manage increased demand.
Scaling Type | Description |
---|---|
Horizontal |
More machines |
Vertical |
More power |
Stateful (TODO)
dedicated storage
Stateless (TODO)
shared storage
TCP vs UDP
UDP is faster than TCP but less reliable.
TCP establish connection via “handshake” (SYN -> SYN-ACK -> ACK)
TCP indicates and confirms the order in which order packets should be received. UDP does not.
Because of that, UDP is much faster but could lose packets (datagrams).
So applications that use UDP must be able to tolerate errors (loss and duplication).
So UDP is commonly used in time-sensitive communication where occasionally dropping packets is better than waiting (like voice and video application or online gaming and DNS servers)
TLS
TLS uses a technology called public key encryption:
there are two keys, a public key and a private key, and the public key is shared with client devices via the server's SSL certificate.
When a client opens a connection with a server, the two devices use the public and private key to agree on new keys (TLS handshake), called session keys, to encrypt further communications between them.
Thread vs Process
Processes are basically the programs that are dispatched from the ready state and are scheduled in the CPU for execution
A process can create other processes which are known as Child Processes.
Process is isolated (doesn’t share memory with other process)
Thread is the segment of a process (a process can have multiple threads)
Threads are faster and share memory
pub/sub
Is a messaging model that consists of publishers and subscribers.
Publishers publish messages to topics (also called channels) without knowing who will read those messages.
Subscribers subscribe to topics and read messages coming through those topics.