Wednesday, April 24, 2019

A brief walk-through of Azure Cosmos DB


Azure Cosmos DB


Azure Cosmos DB – an Introduction


In the perspective of the quick response of an application, that is required to be low latency and high availability. In this context, the instances are expected to deploy in datacenters which are closest to the users. Motto behind this, to achieve a highly responsive application eco-system in real time scenario and the data available to users within milliseconds.

If I talk about the better management of data by using one of Azure services, then undoubtedly Azure Cosmos DB, a multi-model database service makes a significant impact. Azure Cosmos DB is a globally distributed database service, which allows you to manage your data elastically and independently scale throughput and storage throughout the world. In 2014, Microsoft introduced its first cloud-based NoSQL database called Azure Document DB. It was a document oriented NoSQL database offering SQL like querying interface for retrieving the document data. Later the Azure Cosmos DB, launched in 2017 and is a progression on top of its ancestor Azure Document DB.

In brief, Azure Cosmos DB (formerly known as Document DB) is a multi-model NoSQL database in an Azure cloud platform that offer to store and process massive amount of structured, semi-structured and unstructured data. It provides native support to  various platforms to access your database like, MongoDB APIs, Cassandra, Azure Tables, Gremlin and SQL.

Azure Cosmos DB


NoSQL – an Introduction


I talk about Azure Cosmos DB, which is a multi-model NoSQL database that provides independent scaling across all Azure regions. If you are conceptually aware about the term NoSQL then it would be really easier to grasp Cosmos DB, though no then we can discuss briefly herewith.

NoSQL


NoSQL database is not new, it has been around for quite some time. NoSQL database stands for Not Only SQL, or Not SQL, the NoSQL concept was introduced by Carl Strozz in 1998. It is a Non-Relational database management system, refers to all databases and data stores that are not based on Relational Database Management Systems (RDBMS) principles.  NoSQL does not require a fixed schema, avoids joins, and is easy to scale. NoSQL is designed for distributed data stores where dealing with huge volume data is required. Such data are stored may not require fixed schema, avoid joining operations and typically scale horizontally.

In fact, NoSQL is used for Big Data and real-time web applications like, Google, Facebook, Amazon, etc. engaged with terabytes of data on a daily basis. Here system response time becomes slow when you use an RDBMS for massive volumes of data. Technically, to fix this issue, you can go with either Scale Up (Vertical) or Scale Out (Horizontal) system. 
  • Scale Up – It means, upgrading our existing hardware.
  • Scale Out – It means distribute database load on multiple hosts whenever the load increase.


Since Scale Up (Vertical) approach is a bit expensive approach, and NoSQL database is non-relational, so it Scales Out (Horizontal) better than relational databases. In the context of traditional RDBMS where SQL syntax is being used to store and retrieve data, a NoSQL database system incorporates as a wide range of database technologies that can store structured, semi-structured and unstructured data.

Scale Up - Out

Following are the key features of NoSQL database – 
  • Non-relational data model
  • Runs well on cluster
  • Mostly open-source
  • Built for new era of Web applications
  • Schema-less
  • Not ACID Compliant
  • Supports horizontal scaling


Basically, there are four basic types of NoSQL databases – 
  1. Key Value Based – It is one of the simples NoSQL databases,  having a big hash table of keys and values and can be easily looked up to access data. 
  2. Document Based – In this type, the key is paired with a complex data structure called as a document. It treats a document as a whole and avoid splitting a document in component name/value pairs.
  3. Column Based – It allows data to be stored effectively, used to store large data sets. Here, each storage block contains data from only one column.
  4. Graph Based – It is a network database that uses nodes, edges and properties to represent and store data.


NoSQL database types

Azure Cosmos DB Features


In the previous section, we came to know that NoSQL databases are the type of databases which stores and retrieves data in a different way as compared to traditional relational database. Microsoft enhanced Document DB and came up with Azure Cosmos DB notified as a globally distributed, multi-model database.

Following are the key capabilities of an Azure Cosmos DB – 
  • Global Distribution – Cosmos DB seamlessly replicates your data across Azure geographical regions that ensures availability and low latency.
  • High Scalability – In the context of massive data, horizontal Scaling (Scale Out) makes Cosmos DB more scalable and durable, designed to manage elastically scaled throughput worldwide.
  • Low Latency – Cosmos DB guarantees less than 10 milliseconds latency for both, reading and writing (indexed) data, all around the world.
  • Multi-model and Multi-API database – Azure Cosmos DB is based on the atom (small set of primitive data types) – record (structures composed of the types stored in an atom)-sequence (arrays consisting of atomes, records, or sequences) data model that supports multiple data model like documents, table, graph, key-value pairs, etc.
  • Multiple Consistency Models – It offers a spectrum of five consistency levels (Strong, Bounded Staleness, Session, Consistent Prefix & Eventual) based on the consistency and availability trade-offs. 
  • Schema-free – Cosmos DB automatically indexes all the data it consumes without requiring any schema. Since no schema and index management is required, you also don’t have to worry about application downtime while migrating schema.
  • Tooling – Apart from the different APIs, you can utilize strong tools that simplify a lot of operations, like dtui.ex, dt.exe, DB Emulator, DB explorer, Capacity Planner etc.


Azure Cosmos DB key capabilities

Multi APIs Support 


In earlier version of Azure NoSQL DB (i.e. DocumentDB), only JSON documents were supported. But if I talk about development with Azure Cosmos DB, you  can get multiple type of APIs and by using these you can store and process different type of data stores accordingly, like – 
  • Table
  • Key-Value
  • Document
  • Graph
  • Column


Azure Cosmos DB provides the flexibility to go with a variety of APIs to access the following data, with SDKs available in different languages – 
  1. SQL API – Azure Cosmos DB has native support to quering documents or items using SQL and JSON data. It mainly treats entities as JSON documents.
  2. MongoDB API – Primarily it is used to communicate between Azure Cosmos DB and applications written for MongoDB. It mainly works with MondgoDB’s binary version of JSON called as BSON. By minimal changes and native supports, you can migrate MongoDB based applications to Cosmos DB.
  3. Cassandra API – It is based on Columnar data storage feature, by using this API Cassandra based applications can be migrated over to Cosmos DB.
  4. Graph (Gremlin) API – In the context of data annotation with meaningful relationships, Gremlin API can be used. It supports modeling Graph data and provides APIs to traverse through the graph data.
  5. Table API – Basically, this API is a progression to Azure Table Storage, and applications using Azure Table Storage can be migrated to Azure Cosmos DB with no code changes.


At a brief glance, all these API looks different in term of structure, though they all share the same capabilities of Cosmos DB. Selection of API is entirely up to requirement and your decision, mostly depends on the data you are going to deal.

Cosmos DB APIs

Azure Cosmos DB – brief Technical Outline


You can create an Azure Cosmos DB by provisioning a database account (we will cover in the next hands on activity’s post), which is in fact, manages one or more databases. Basically, an Azure Cosmos DB manages users, permission and containers subsequently. Azure Cosmos DB container is a schema-less container that contains different entities, stored procedures, triggers and user-defined functions, etc. In fact, entities under the database account that contains databases, users, permissions, containers, etc. are referred to as resources.

In simple words, Azure Cosmos DB follows this hierarchy – 

Cosmos DB Container


Depending on the above discussed APIs, container and item resources are projected as specialized resource types. A container is horizontally partitioned and replicated across multiple regions. Under  the container and the throughput that you provisioned, each item distributed across a set of logical partitions based on the partition key automatically.

Cosmos DB resource model

Here, briefy, we covered that an Azure Cosmos DB is a globally distributed, highly scalable and multi-model service. Azure Cosmos DB enables you to elastically scale throughput and storage across the geographical regions. No doubt, we didn’t deep dive into technical overview or in the context of architecture approach, though you can go through via Microsoft’s article to deep dive – 


Along with this conceptual brief, in the next post we will leverage Azure Cosmos DB by using some hands on activities, keep visiting the blog.


1 comment: