Table of Contents
- Related Readings
- Partition Key, Clustering Key and Primary Key
- Wide Partition
- Serial Consistent Level and Lightweight Transaction
- Issues to be aware of
- Notes from the white paper
- Cassandra - A Decentralized Structured Storage System
- Cassandra Glossary
- Cassandra Anti-patterns by Edison Stark
- How is data read
Partition Key, Clustering Key and Primary Key
In brief, each table requires a unique primary key. The first field listed is the partition key, since its hashed value is used to determine the node to store the data. If those fields are wrapped in parentheses then the partition key is composite. Otherwise the first field is the partition key. Any fields listed after the partition key are called clustering columns. These store data in ascending or descending order within the partition for the fast retrieval of similar values. All the fields together are the primary key.
Note: The Partition Key is responsible for data distribution across your nodes. The Clustering Key is responsible for data sorting within the partition.
Here are the detailed definitions from the Cassandra documentation:
- Partition Key: The first column declared in the PRIMARY KEY definition, or in the case of a compound key, multiple columns can declare those columns that form the primary key.
- Clustering Column: In the table definition, a clustering column is a column that is part of the compound primary key definition, but not the first column, which is the position reserved for the partition key. Columns are clustered in multiple rows within a single partition. The clustering order is determined by the position of columns in the compound primary key definition.
- Primary Key: One or more columns that uniquely identify a row in a table.
- compound Primary Key: A primary key consisting of the partition key, which determines on which node data is stored, and one or more additional columns that determine clustering.
Partition is defined by the partition key. As we can see a partition can have multiple rows. This makes senses because partition key is part of primary key and it’s the primary key that determines a unique row in the table.
The location of the partition in the data is saved in an index. Conceptually, an index is a map from partition key to the location of the partition. Thus determining the location of partition is sort of a get operation of a hash map. The rows in a partition is sorted based on the clustering columns for efficient read. Conceptually, determining the location of a row in a partition involves some kind of binary search.
We may find some information about index data in the description of SSTables. SSTables are the immutable data files that Cassandra uses for persisting data on disk and among other things it contains
- Data.db: The actual data, i.e. the contents of rows.
- Index.db: An index from partition keys to positions in the Data.db file. For wide partitions, this may also include an index to rows within a partition.
It basically means partitions with lots of data (i.e. rows). The reason why we should avoid wide partitions is the following: a partition is the fundamental unit of replication in Cassandra and it requires works and coordination. Keep partition small can make those works and coordination incremental. Wide partition is the root cause of multiple issues in Cassandra. A thumb rule says that don’t go beyond 100 MB, however, a good data model design should keep it much lesser.
According to the Cassandra documentation:
An index provides a means to access data in Cassandra using attributes other than the partition key. The benefit is fast, efficient lookup of data matching a given condition. The index indexes column values in a separate, hidden table from the one that contains the values being indexed.
We often hear the term primary indexes and secondary indexes. In Cassandra, the primary indexes are just primary keys. As we saw in the partition section previously, using primary keys (partition keys + clustering columns) we can identify the location of a row. Regarding the secondary indexes, they are just indexes that are not primary indexes.
It's not as obvious as it appears when using secondary indexes in Cassandra. The main reason is that there may be many partitions involved for a given secondary index value which means Cassandra needs to query different nodes. Here are some online resources that discuss this issue:
- Cassandra - When to use an index
- Cassandra at Scale: The Problem with Secondary Indexes
- The sweet spot for Cassandra secondary indexing
The number of unique values in a column. For example, a column of employee ID numbers, unique for each employee, would have high cardinality; a column of employee ZIP codes would have low cardinality.
An index on a column with low cardinality can boost read performance since the index is significantly smaller than the column. An index for a high-cardinality column may reduce performance. If your application requires a search on a high-cardinality column, a materialized view might be a better choice. (source)
Serial Consistent Level and Lightweight Transaction
Lightweight transaction is similar to compare and set or conditional write.
DELETE ... IF EXISTS INSERT .... IF NOT EXISTS
- Reading before writing
- Load balancers
- Too many tables
- Storing big payload as a column with datatype text or blob is not wise. Recommended practical size is less than 1 MB but try keeping in Kbs.
- Select all or select count without partition key will cause full table scan and should not be run on big dataset.
- Multi-partition batch
- Collections are meant for storing/denormalizing a relatively small amount of data. It is an anti-pattern to use a (single) collection to store large amounts of data.
- Storing the whole entity as a single column blob.
Issues to be aware of
Notes from the white paper
- While in many ways Cassandra resembles a database and shares many design and implementation strategies therewith, Cassandra does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format.
- treats failures as the norm rather than the exception
- Every operation under a single row key is atomic per replica no matter how many columns are being read or written into.
- In addition to the actual data persistence component, the system needs to have the following characteristics:
- scalable and robust solutions for load balancing
- membership and failure detection,
- failure recovery
- replica synchronization,
- overload handling
- state transfer,
- concurrency and job scheduling
- request marshalling
- request routing
- system monitoring and alarming
- configuration managerment
- One of the key design features for Cassandra is the ability to scale incrementally.
- consistent hashing
- Each data item is replicated at N hosts.
- Each key, \(k\), is assigned to a coordinator node. The coordinator is in charge of the replication of the data items that fall within its range.
- Replication policies
- Cassandra system elects a leader amongst its nodes using Zookeeper.
- Cassandra uses a modified version of the \(\Phi\) Accrual Failure Detector.
- Accrual Failure Detectors are very good in both their accuracy and their speed and they also adjust well to network conditions and server load conditions.
- Typical write operation involves a write into a commit log for durability and recoverability and an update into an in-memory data structure.
- The write into the in-memory data structure is performed only after a successful write into the commit log.
- compaction process
- The Cassandra process on a single machine primarily consists of the following abstractions:
- partitioning module
- the cluster membership and failure detection module
- the storage engine module
- All system control messages rely on UDP based messaing while the application related messages for replication and request routing rely on TCP.
- Consistent hashing
- Gossip protocol
- Hinted handoff: A hint is written to the coordinator node when a replica is down.
- Read repair: Background digest query on-read to find and update out-of-date replicas.
- SSTable storage
- columns, composites, counters, secondary indexes
- There is no primary node in Cassandra cluster but there is a coordinator for a given key.
----- END -----
©2019 - 2022 all rights reserved