Scaling File System Metadata Using NoSQL Databases
- MOTIVATION AND INTRODUCTION
The metadata term refers to information about
other data that does not include the data itself.
Moreover, metadata management service is the
scalability hurdle for massively distributed file
systems. NoSQL databases have the capability of storing distributed file system metadata
in databases. This project demonstrates how to
develop a distributed file system using a NoSQL
database. We will use HopsFS, which is a distribution of the Hadoop Distributed File System (HDFS). HDFS allows for the separation of
file system metadata storage and management
functions. We can store all metadata in high
availability, in-memory, distributed, relational
database using HopsFS.
In this project, we will implement HopsFS by
using NoSQL database[2]. The MongoDB, which
is NoSQL open-source database, has replication, auto-elections, and fragmenting. Moreover, MongoDB use the auto-elections, developers may configure a backup database to take
over auto-matically if the original database fails.
On the other hand,fragmenting enables horizontal scaling. MongoDB is built entirely on
a distributed architecture; consequently, MongoDB provides data localization with automated
sharding and replica sets to provide always-on
availability[1]. MongoDB is fast because it does
not do as many operations as relational database
management systems (RDBMS), yet relational
databases are slow for a cause. - SOLUTION APPROACH
The main idea behind this project is Scaling
File System Metadata Using NoSQL Databases
and comparing the results with NewSQL results. The comparison between the two outcomes would be based on some features such
as scalability, throughput, latency, and performance. We would like to use NoSQL as a test
example for their approach instead of using NewSQL.
Therefore, MongoDB is the tool that we will
use for our project. Furthermore, a MongoDB
cluster enables a MongoDB database to grow
horizontally across multiple servers via sharding or duplicate data to provide high availability using MongoDB replica sets, improving the
overall performance and reliability of MongoDB
cluster. - ANALYSIS
In this section, we will try to analyze the throughput and latency and compare our results with
HopsFS [1]. Since we are implementing NoSQL
using MongoDB, there are some benefits such
as High-performance writing and massive scalability and Support wide range of modern programming languages and tools. However, this
comes at the expense of some relational database
safety features, such as referential integrity. - RESPONSIBILITIES
Implementing HopsFS by using NoSQL database
In this project, X will be working on implementing NoSQL using MongoDB and Y will be
working on comparing our implementation results with HopsFS results in tearm of scalability. - REFERENCES
[1] W. Jiang, L. Zhang, X. Liao, H. Jin, and
Y. Peng. A novel clustered mongodb-based
storage system for unstructured data with
high availability. Computing,
96(6):455–478, 2014.
[2] S. Niazi, M. Ismail, S. Haridi, J. Dowling,
S. Grohsschmiedt, and M. Ronstr¨om.
{HopsFS}: Scaling hierarchical file system
metadata using {NewSQL} databases. In
15th USENIX Conference on File and
Storage Technologies (FAST 17), pages
89–104, 2017.