What is Azure Table Storage (Study Notes)

Azure Table Storage: It is "A NoSQL key-value store for rapid development using massive semi-structured datasets" as defined on Microsoft Azure site for Table Storage.
And also

  • It can store petabytes of structured data
  • supports flexible data schema
  • Made for enterprise

The fact that we can store Structured data and not relational got me interested. I started reading about it online at Azure website and also several posts from Julie Lerman. One of the posts that I found interesting and informative was by Julie Lerman. And I started writing down notes, following is what I have observed from it.

Relational databases have various tables, each containing a predefined set of columns, one or more of which are typically designated as identity keys. Tables use these keys to define relationships among one another.

Azure Table storage, on the other hand, seems a bit mysterious to those of us who are so used to working with relational databases

Storing Data for Efficient Retrieval and Persistence

  • By design, Azure Table services provides the potential to store enormous amounts of data, while enabling efficient access and persistence
  • We just have to deal with the data and not to worry about - constraints, view, indices, relationships and stored procedures.
  • Table Storage uses keys that enable efficient querying - "Partition Key"
  • This key is also used for load balancing if table storage service decides to spread the data over multiple servers.
  • This doesn't have a specified schema.
  • It is a structured container of rows, which doesn’t care what a row actually looks like.
  • We (It) can store rows with varying structures in a single table.

It All Begins with Your Domain Classes

  • Traditionaly with databases we define a table, and particular structure, and specific columns and data type for each column and relationships with other tables.
  • With Azure table services, we don’t design a database, but just design our classes. We define the classes and a Container (TABLE) that one or more classes belong to.Then we save those instantiated objects to store as rows.
  • Imp: each class must have 3 properties that are critical in how Table service do the job.
    • PartitionKey - string
    • RowKey - string
    • TimeStamp
  • Defining PartitionKey and RowKey is a challenge, to get the best balance of query and transaction efficiency. For more information you can refer to PDC09 session “Azure Tables and Queues Deep Dive”

PartitionKeys and RowKeys Drive Performance and Scalability

  • The strings partitionKey and RowKey properties work as an index for the table.
  • Together these properties also provide uniqueness. Act as a primary key for a row. And hence the challenge of getting the combination right.
  • Each table in the entity mush have a unique PartitionKey/RowKey combination
  • More than querying, ParitionKey is also used to Partition tables. Provides load balancing and scalability.

Digging Deeper into Partition Keys and Querying

  • Queries to azure tables from .NET framework uses LINQ to REST.
  • The context is derived from WCF Data Service (System.Data.Services.DataServiceContext).
  • The BIG difference between querying ODATA (returned by WCF data services) and querying Azure table storage is that string functions are not supported.
  • String.CompareTo can be used to search part of a string.

Parallel Querying for Full Table Scans

  • If we were to scan through the entire table, and if it’s a large table azure can only return 1000 rows or process for 5 seconds.
  • Azure returns those rows with a "continuation key" and go back to get more rows. This can be a tedious synchronous process.
  • Instead : we can build the query first, by iterating through the known list of categories and build a query and then send off those queries to run in parallel.
  • More design considerations for querying
    • In addition to Partition Key, Row key property defines uniqueness within a table for each row.
    • A GUID would serve the purpose but cannot help us in performing a search or sorting etc.
    • In this case we can use a combination of values. This combination of values can also help us in sorting the data, but if the application is not completely searching or sorting on those values, we can always go ahead and use a GUID in the combination.

Rethinking Relationships

  • Since there is no relationships within the database tables (containers), the code is responsible, which can impact on how the queries or updates are done.
  • If we were to store data across multiple tables we cannot have transacted commands, we use the Azure Table store unique capability of storing multiple rows with varying schema in a single table.
  • We can use the PartitionKey to link those two or more tables within one row and use the RowKey to separate within.
    • Ex: saving contact and address info in one table, partitionKey can be common between these two but RowKey can separate those two entities within one table row.
    • This common PartitionKey ensures that the row will always stay together to take advantage of a feature called "Entity Group Transactions (EGT)".
    • This allows a single operation to carry out a transaction between multiple entities with same PartitionKey.
    • One other feature is that we can perform an update on all entities within EGT with one transaction query.

References

A Big THANK YOU to Julie Lerman for all her work on all DATA. Please check out her blog mentioned above and also please follow her on Twitter.

Thank You
Vijaya Malla.
@vijayamalla