Cassandra Basics

Subscribe Send me a message home page tags


#cassandra  #basics  #documentation 

Table of Contents

Related Readings

Concepts

Partition Key, Clustering Key and Primary Key

Quote from Partition Key vs Composite Key vs Clustering Columns in Cassandra:

In brief, each table requires a unique primary key. The first field listed is the partition key, since its hashed value is used to determine the node to store the data. If those fields are wrapped in parentheses then the partition key is composite. Otherwise the first field is the partition key. Any fields listed after the partition key are called clustering columns. These store data in ascending or descending order within the partition for the fast retrieval of similar values. All the fields together are the primary key.

Note: The Partition Key is responsible for data distribution across your nodes. The Clustering Key is responsible for data sorting within the partition.

Here are the detailed definitions from the Cassandra documentation:

Partition

Partition is defined by the partition key. As we can see a partition can have multiple rows. This makes senses because partition key is part of primary key and it’s the primary key that determines a unique row in the table.

The location of the partition in the data is saved in an index. Conceptually, an index is a map from partition key to the location of the partition. Thus determining the location of partition is sort of a get operation of a hash map. The rows in a partition is sorted based on the clustering columns for efficient read. Conceptually, determining the location of a row in a partition involves some kind of binary search.

We may find some information about index data in the description of SSTables. SSTables are the immutable data files that Cassandra uses for persisting data on disk and among other things it contains

partition.png

Wide Partition

It basically means partitions with lots of data (i.e. rows). The reason why we should avoid wide partitions is the following: a partition is the fundamental unit of replication in Cassandra and it requires works and coordination. Keep partition small can make those works and coordination incremental. Wide partition is the root cause of multiple issues in Cassandra. A thumb rule says that don’t go beyond 100 MB, however, a good data model design should keep it much lesser.

Indexing

According to the Cassandra documentation:

An index provides a means to access data in Cassandra using attributes other than the partition key. The benefit is fast, efficient lookup of data matching a given condition. The index indexes column values in a separate, hidden table from the one that contains the values being indexed.

We often hear the term primary indexes and secondary indexes. In Cassandra, the primary indexes are just primary keys. As we saw in the partition section previously, using primary keys (partition keys + clustering columns) we can identify the location of a row. Regarding the secondary indexes, they are just indexes that are not primary indexes.

It's not as obvious as it appears when using secondary indexes in Cassandra. The main reason is that there may be many partitions involved for a given secondary index value which means Cassandra needs to query different nodes. Here are some online resources that discuss this issue:

Cardinality

The number of unique values in a column. For example, a column of employee ID numbers, unique for each employee, would have high cardinality; a column of employee ZIP codes would have low cardinality.

An index on a column with low cardinality can boost read performance since the index is significantly smaller than the column. An index for a high-cardinality column may reduce performance. If your application requires a search on a high-cardinality column, a materialized view might be a better choice. (source)

Serial Consistent Level and Lightweight Transaction

Lightweight transaction is similar to compare and set or conditional write.

Example:

DELETE ... IF EXISTS
INSERT .... IF NOT EXISTS

Anti-patterns

Issues to be aware of

Notes from the white paper

----- END -----

If you have questions about this post, you could find me on Discord.
Send me a message Subscribe to blog updates

Want some fun stuff?

/static/shopping_demo.png