Project Proposal Report

Scaling File System Metadata Using NoSQL Databases

  1. MOTIVATION AND INTRODUCTION
    The metadata term refers to information about
    other data that does not include the data itself.
    Moreover, metadata management service is the
    scalability hurdle for massively distributed file
    systems. NoSQL databases have the capability of storing distributed file system metadata
    in databases. This project demonstrates how to
    develop a distributed file system using a NoSQL
    database. We will use HopsFS, which is a distribution of the Hadoop Distributed File System (HDFS). HDFS allows for the separation of
    file system metadata storage and management
    functions. We can store all metadata in high
    availability, in-memory, distributed, relational
    database using HopsFS.
    In this project, we will implement HopsFS by
    using NoSQL database[2]. The MongoDB, which
    is NoSQL open-source database, has replication, auto-elections, and fragmenting. Moreover, MongoDB use the auto-elections, developers may configure a backup database to take
    over auto-matically if the original database fails.
    On the other hand,fragmenting enables horizontal scaling. MongoDB is built entirely on
    a distributed architecture; consequently, MongoDB provides data localization with automated
    sharding and replica sets to provide always-on
    availability[1]. MongoDB is fast because it does
    not do as many operations as relational database
    management systems (RDBMS), yet relational
    databases are slow for a cause.
  2. SOLUTION APPROACH
    The main idea behind this project is Scaling
    File System Metadata Using NoSQL Databases
    and comparing the results with NewSQL results. The comparison between the two outcomes would be based on some features such
    as scalability, throughput, latency, and performance. We would like to use NoSQL as a test
    example for their approach instead of using NewSQL.
    Therefore, MongoDB is the tool that we will
    use for our project. Furthermore, a MongoDB
    cluster enables a MongoDB database to grow
    horizontally across multiple servers via sharding or duplicate data to provide high availability using MongoDB replica sets, improving the
    overall performance and reliability of MongoDB
    cluster.
  3. ANALYSIS
    In this section, we will try to analyze the throughput and latency and compare our results with
    HopsFS [1]. Since we are implementing NoSQL
    using MongoDB, there are some benefits such
    as High-performance writing and massive scalability and Support wide range of modern programming languages and tools. However, this
    comes at the expense of some relational database
    safety features, such as referential integrity.
  4. RESPONSIBILITIES
    Implementing HopsFS by using NoSQL database
    In this project, X will be working on implementing NoSQL using MongoDB and Y will be
    working on comparing our implementation results with HopsFS results in tearm of scalability.
  5. REFERENCES
    [1] W. Jiang, L. Zhang, X. Liao, H. Jin, and
    Y. Peng. A novel clustered mongodb-based
    storage system for unstructured data with
    high availability. Computing,
    96(6):455–478, 2014.
    [2] S. Niazi, M. Ismail, S. Haridi, J. Dowling,
    S. Grohsschmiedt, and M. Ronstr¨om.
    {HopsFS}: Scaling hierarchical file system
    metadata using {NewSQL} databases. In
    15th USENIX Conference on File and
    Storage Technologies (FAST 17), pages
    89–104, 2017.