Relational Database Modeling — LinkedIn Schema Design — Part I
Hi All, Whenever a system has to be built there will be various entity we have to keep in mind. The first and the most important steps in building the optimized system is to design the schema and deciding how entites communicate between them. In this blog we can design a basic LinkedIn schema structure.
The schema design can be one-to-many, many-to-many. We should identify when to seperate two entities and when not to.
When we have to design a schema the first step is to write down all the features, identifying the entities and how they communicate with each other. In this example LinkedIn, we can take 6 feature and discuss the first 3 in this blog and the remaining in the next blog. Now let’s jump into the features.
1. User should be able to view/edit the profile containing the following information
a. On the top left we should show user’s profile photo.
b. On right side of profile pic, the name and contact information should be displayed.
Here for the first feature the entity is user. User has id, profile pic, name, contact information(mobile, email).
id: mandatory unique key for every entity.
profile_pic: Here we can either store the image in db. But it occupies more space and becomes clumsy. The best way is to store the image in storage like s3 and store the uri in the entity.
name: This can be just a string to contain name.
mobile/email: This is for storing the contact information of the user.
2. User can have multiple skill. As user types in we should be able to auto-suggest some skills.
Here we have two entities User and Skills. One user can have multiple skill and a skill can be common to one or more user. Here the relationship between User and skill entity is many-to-many.
We can have a look on three approaches on how to have these two inteconnected.
a. User profile can have skills as array.
But here the problem is when we are searching for the users with a particular skill say java we have to iterate through the skills array in all the user, which is not a good approach . And also since one or more user can have same skill the autopopulate will be difficult on getting unique skills.
b. The second option is to have a skill entity seperately and associate user_id in it. Here the problem is just to get the name of all the unique skills the time taken will be slow.
c. The best option is to create a mapping entity for user and skill and then associate them. Having a mapping collection will be best for the many-to-many relationship
3. A user may also choose to enter the education institution details.
b. start year
4. A user may also choose to enter his details about previous/current employees.
b. start year
Here since both have same attributes in it, we can have a common entity institute and have these details in it.
Here the institute type will be having information whether its an educational institute or an employee. What if education is having an extra attribute CGPA and the employee has an additional information on salary? Then we have to seperate it out.
We have to now map these institute/ company to the user. A user can have multiple institute/company and similary an institute/company can have multiple users. Its a many-to-many relationship. In this case the better approach is to create the user-institute and user-company mapping entity
It’s always simple to break down the system into small pieces/feature and then start building schema from it. Tune on till the next blog.