Skip to main content

System Design Deep Dive: Building a Music Streaming Service

In today’s digital age, music streaming services have become an integral part of our daily lives. Among them, platforms like Spotify lead the pack with their vast libraries and intuitive user experiences. Designing such a service involves a complex interplay of algorithms, databases, and networking. This article delves into the core components required to build a scalable and reliable music streaming service.

Core Features of a Music Streaming Service

Before diving into the system design, let’s outline the fundamental features our service will offer:

  1. Music Playback: Allows users to stream music tracks.
  2. Search Functionality: Enables users to find songs, albums, artists, and playlists.
  3. User Account Management: Supports user registration, authentication, and profile management.
  4. Playlist Creation and Management: Users can create, share, and edit playlists.
  5. Recommendation Engine: Suggests music based on listening habits.

Initial Phase: Base Version

Requirements:

  • Users: 1 million, who plays the songs
  • Songs: 20 million
  • Artists: 100,000, who uploads the songs

System Architecture Overview

Designing a service capable of handling millions of concurrent users requires a robust architecture. Below is a simplified overview:

  • App: Web or mobile app through which users will interact with the music streaming service
  • Web Servers: Handle API requests such as user authentication, metadata retrieval, and search queries.
  • Load Balancers: Distribute incoming requests evenly across a network of servers to prevent any single point of overload.
  • Application Servers: Process business logic, including playlist management and music recommendation algorithms.
  • Database Storage: Store structured data such as: user data, music metadata, playlists etc is stored using SQL since SQL allows for complex and faster queries and manage relationships.
  • Blob Storage: The song files will be stored in a Blob (Binary Large Object) storage service ex: Cloudflare R2, AWS S3, Google Cloud Platform, Azure Blob Storage etc which are meant for storing large unstructured data. This allows for efficient storage and retrieval.
  • CDN (Content Delivery Network): Distributes music files globally to minimize latency during music playback using a CDN service ex: Cloudfront / Cloudflare.
  • Cache Layer: Improves data retrieval performance by temporarily storing frequently accessed data. We can use the LRU (Least Recently Used) caching strategy to cache the popular songs while the unpopular songs will be cached on demand. Usually this is implemented by CDN service providers

Storage Estimation

Song Storage

  • Assumption: Average song size is 3MB.
  • Calculation: With 20 million songs, the total storage needed is 3MB×20,000,000=60,000GB or 60TB

Song Metadata Storage

  • Assumption: Average metadata size per song is about 100 bytes.
  • Calculation: For 20 million songs, 100 bytes×20,000,000=2GB

User Metadata Storage

  • Assumption: On average, 1KB of data per user.
  • Calculation: For 1 million users, 1KB×1,000,000=1GB

Key Components Explained

1. Data Model – SQL Database Structure

User table:

Song table:

Artist table

Relationships: 
We have joined the Artist and Song Tables, where we will have the artistID (Foreign key pointing to the Artist Table) and SongID (Foreign key pointing to the Song Table). From there, we can get the song metadata, which will also contain the fileURL property, pointing to the Blob storage where the song is located.

2. Efficient Search Mechanism

Implementing a fast and accurate search feature requires indexing and a robust search algorithm. Elasticsearch is a popular choice for this purpose, given its scalability and speed.

3. Personalized Music Recommendation

Machine learning algorithms analyze user listening habits to provide personalized music recommendations. This involves processing large datasets to identify patterns and preferences.

Putting it all together – Initial Phase

Scaled Phase: Expansion to 500 Million Users

Requirements

  • Users: 500 million
  • Songs: 100 million
  • Artists: 11 million

Scaling to half a billion users and expanding the music library tenfold presents significant challenges. The architecture must not only support increased load but also maintain, if not improve, the quality of service.

Scaling Strategies

  • Microservices Architecture: Breaking down the application into microservices allows for easier scaling and maintenance.
  • Advanced Load Balancing: Implementing more sophisticated load balancing techniques to distribute traffic efficiently across servers worldwide.
    • Global Server Load Balancing (GSLB): Distributes traffic across multiple data centers based on location, improving speed and reliability.
    • Layer 7 Load Balancing: Makes routing decisions based on the content of HTTP/HTTPS headers, allowing for intelligent traffic distribution.
    • Content-Aware Load Balancing: Routes requests based on content type or user behavior, optimizing resource use for different types of traffic.
    • Adaptive Load Balancing: Dynamically adjusts routing based on current server load and network conditions, enhancing performance.
    • Machine Learning-Driven Load Balancing: Uses predictive analytics to optimize traffic distribution, improving over time as it learns traffic patterns.
  • Scaling database with Leader – Follower technique: Now we have more users who will perform read only operations while only few artists who will do read and write. We can implement a Leader database which will perform both read and write. Leader database will have multiple follower databases which will be dedicated for read only operations.
  • Data Sharding and Replication with Leader – Leader technique: Segmenting the database into smaller, manageable parts (shards) to improve performance and ensure data availability.
  • Enhanced CDN Strategies: Utilizing multiple CDNs to reduce latency further and handle the increased traffic.
  • Sophisticated Machine Learning Models: Implementing more complex algorithms for the recommendation engine to handle the larger dataset and provide more accurate suggestions.

Scaled Phase: Expanded Version Storage Estimation

Song Storage

  • Assumption: Maintaining the average song size of 3MB.
  • Calculation: For 100 million songs, the required storage expands to 3MB×100,000,000=300,000GB or 300TB

Song Metadata Storage

  • Assumption: The metadata size remains at about 100 bytes per song.
  • Calculation: For 100 million songs, 100 bytes×100,000,000=10GB

User Metadata Storage

  • Assumption: Keeping the average at 1KB of data per user.
  • Calculation: For 500 million users, 1KB×500,000,000=500GB

Putting it all together – Scaled Phase


How to Connect to GitHub or a Web Server Securely with SSH

How to Connect to GitHub or a Web Server Securely with SSH

Watch full video at:

Introduction:
Connecting to GitHub or a web server requires a secure method to protect your sensitive data. While password authentication is a common approach, it can be easily compromised if the password is weak or guessable. In this blog post, we’ll explore a more secure way to connect using SSH (Secure Shell).

But for some reason, if you choose to stick with password then I would recommend that you use something like 1Password which helps you create and manage your password in a more secure way.

The Importance of Security:
When it comes to the security of your GitHub page or web server, it’s crucial to prioritize strong authentication methods. While password-based authentication is relatively convenient, it may not provide sufficient protection against unauthorized access. To enhance security, it’s highly recommended to use SSH.

Understanding Encryption:
Before we delve into SSH, it’s essential to understand encryption. Encryption algorithms like MD5, SHA-1, and SHA-256 can encrypt input data. However, if the encryption algorithm is known, it becomes easier for attackers to decrypt the data. To strengthen encryption, it is advisable to combine an encryption algorithm with a randomly generated salt. This combination significantly increases the difficulty of decryption, especially when the algorithm and random string remain undisclosed.

Generating SSH Keys:
To establish an SSH connection, you need to generate SSH keys. Follow these steps:

  1. Open your terminal and navigate to the SSH folder by running the following command: cd ~/.ssh
  2. Generate the public and private keys using the command: ssh-keygen

This command will generate the public and private key files.

Configuring SSH:
To instruct SSH to use your private key for every connection attempt, you need to modify the SSH config file. Here’s how:

  1. Open the SSH config file using a text editor: vi ~/.ssh/config
  2. Add the following content to the config file: Host * AddKeysToAgent yes UseKeychain yes IdentityFile ~/.ssh/kirandash_github

Adding Key to Apple Keychain:
To streamline the SSH authentication process, you can add the private key to Apple Keychain. Follow these steps:

  1. Run the command: `ssh-add -K kirandash_github

This command adds the private key to the Apple Keychain.

Configuring GitHub To establish a connection with your GitHub account, you need to add the public key. Here’s how:

  1. Display the contents of the public key by running the command: cat kirandash_github.pub
  2. Copy the public key.
  3. Go to your GitHub account settings and find the SSH key settings page.
  4. Add the copied public key to the list of authorized keys.

Conclusion:
By following these steps, you can connect to GitHub or a web server securely using SSH. SSH provides a stronger authentication mechanism compared to password-based authentication, enhancing the overall security of your system. Implementing these measures will help protect your data and ensure a safer connection to your remote servers.

Remember, prioritizing security is crucial in today’s digital landscape, and SSH is an essential tool in achieving that goal.

Tools I use as Lead Frontend Developer

Tools I use as Lead Frontend Developer

Most of the time my team members or viewers from YouTube channel ask me about which softwares or tools I use. This post has list of almost every tools I use as developer.

Tech

  • React – The most widely used frontend framework in the world. Previously I used Angular. But switched to React in 2018.
  • TypeScript – It has helped me avoid tons of bugs for my javascript projects.
  • Testing Library – A great testing library for anything that interacts with the DOM. If you are still using enzyme, it’s time to switch.
  • Jest – A great testing framework.
  • Cypress.io – I use this for E2E testing.
  • Axios – Promise based HTTP client for the browser and node.js.
  • msw – Mock service worker that allows to easily mock requests.
  • Husky, Prettier, pretty-quick, Commitlint
  • Styled Components – A great way to keep my styling consistent and stay productive.
  • React Redux – Any time I have a complex state problem, I use this.
  • Formik – For handling complex forms
  • Storybook – For documentation

Services

  • SonarCloud – For Code Quality check (Needs subscription)
  • Sentry – For application performance monitoring
  • Fly.io – My preferred hosting platform
  • Tekton CI
  • Argo CD
  • GitHub – Where I host my code. I also run CI/CD pipelines via GitHub Actions.
  • godaddy – Where I buy all my domain names.

Editor

Chrome Extensions

Desktop Apps

Covid Tracker React App

Create COVID Tracker – A Modern scalable React JS Application with React, Redux, Thunks, Selectors and Styled Components

Intro

In this tutorial we are going to build a COVID or The Corona Virus Disease Tracker, a modern scalable React Application using React JS framework and four advanced React tools: Redux, Thunks, Selectors and Styled Components. Just React JS is enough for creating simple applications but if you want to build large, high performance applications, your job will be much more simplified if you know how to use these additional tools.

For the Application concept, I thought I would create something that might be useful for me in the current situation. The current situation in World is not good. COVID-19 or The Corona Virus Disease has affected lives of people in many countries. I am currently staying in Singapore and have been working from home since last two weeks. As a developer I thought it would be better to use my time during this weekend to create a Tutorial to build an App for tracking Corona Virus reports from different countries.

Demo:

http://bgwebagency.in/projects/ui/covid-tracker/

Features:

  1. Search Country Report: Country Codes you can use for testing: GB (United Kingdom), US (USA), SG (Singapore), GE (Georgia), IN (India), IT (Italy), ES (Spain).
  2. Pin Country Result to move a result to very top.
  3. Remove a Country from the list.
  4. Persisting the result on reload as well. Note: The results are not being saved in any Database. We will save them in localStorage since our only focus here is Frontend development with React.

Prerequisites for this Tutorial:

To get the most of this tutorial, it will be better if you already know the following:

  1. HTML, CSS
  2. Basic React (Optional but Helpful)

Softwares and APIs Required for this Tutorial:

  1. nodejs: https://nodejs.org/en/
  2. API: We are using an API which will return latest report for a country based on country code.
    https://api.thevirustracker.com/free-api?countryTotal=US
  3. More APIs from the same API provider:
    https://thevirustracker.com/api
  4. Alternate Similar APIs: (If the previous one is not working)
    https://corona.lmao.ninja/v2/countries/SG
    https://covid-19-apis.postman.com/
    or any other API that returns report for a single country.
  5. API for global stats: (For Task after Tutorial)
    https://api.thevirustracker.com/free-api?global=stats
  6. Alternate Similar API for global stats: (For Task after Tutorial)
    https://corona.lmao.ninja/v2/all

What this Tutorial Will Cover?

  1. Introduction, and Project Setup:
    We will setup the project and understand the project structure.
  2. Build the Application view by creating components in React JS.
    Important Sections:
  3. Manage state of the application with Redux.
  4. Handle API/Asynchronous calls with Thunks.
  5. Selectors: A middle layer between API layer and Component View.
  6. Styled Components: For handling CSS in a smart way, from JS file instead of creating separate CSS file.
  7. Build app for Production deployment.

Why use the Advanced React Tools?

Every Application mainly consists of 3 important layers. API layer to get data from the APIs. A data layer where we can handle the data from API and modify it to our requirements, and last but not the least, a view layer to show the data.

React JS framework was basically designed to mainly take care of the views only. Which means React JS is powerful to show data but when it comes to the other tasks such as calling APIs and handling or managing data, although React can do the job, React is not that good. Because, React does not have any specific sets of standards on how to manage state and perform API calls etc. That is ok when we are creating a small application but if you are working on a large application with a team, each developer will have their own ways of handling the code. And thus not having a set of rules will clutter the code and it will be extremely difficult to debug the code in future. So that’s where all these tools are helpful. They provide extra sets of rules on how to do things. For example: Redux takes care of data or state management by adding some extra standard rules. Similarly Thunks have a standard way of calling the APIs. And styled components have a specific way of handling CSS.

So in summary, these extra tools help us organise the application in a much  standard way by separating the responsibilities among different tools instead of handling everything with React. Thus, the Application is easy to manage and expand.

Tasks for you after Tutorial:

  1. Create Unpin Country Button, clicking which The pinned Countries can come back to the Not Pinned Countries Section.
  2. Create another React Component to show the Global Stats of total cases, from API: https://api.thevirustracker.com/free-api?global=stats. Use the same flow of first creating a GlobalStats.js component, add redux globalstats reducer, add selectors and finally adding styles with styled components.
  3. Modify the reducer code to remove isLoading reducer and add isLoading as a property of state.countries instead of direct child of the state. Because now we are adding a new API. and we will need isLoading property for individual API. So also we need to add another isLoading prop for state.globalstats.

Important links:

App Demo: http://bgwebagency.in/projects/ui/covid-tracker/

Github Project: https://github.com/kirandash/covid-tracker

Follow Me On Github: https://github.com/kirandash

Follow Me On Twitter: https://twitter.com/TheKiranDash

Coding Challenges for JavaScript Interview

In this tutorial we will go through JavaScript Interview Questions and answers.

Questions Discussed:

  1. Create a function which takes a number as a string and returns the number as a string without trailing and leading zeros.
  2. Create a function which takes an object as input and converts it into an array of keys and values.
  3. Create a function which takes an Array of numbers as input and returns the smallest number from array.
  4. Create a function which takes a string as input and returns true if the string contains any number.
  5. Create a function which will calculate the length of a Nested Array.
  6. Create a function which will take two dates as input and returns the number of days between them.
  7. Create a function which will remove last vowels from every word in a sentence.

Watch the video shared above for solutions.

Important links:

Github Project: https://github.com/kirandash/challenges

Follow Me On Github: https://github.com/kirandash

Follow Me On Twitter: https://twitter.com/kirankdash