Redshift
- AWS RedShift is a fast petabyte scale data warehouse service on the cloud
- OLAP vs OLTP
- OLTP is online transaction processing. Use RDS for OLTP. More writes less reads. Example: E-commerce website with Shopping Cart
- OLAP is online analytical processing.
- Few writes and many many reads especially those that aggregate an entire column based on conditions.
- Example: Query for Sum of all sales in January across all states in the south region.
- RedShift uses columnar database
- Data is sequentially stored by column on the disk as opposed to by row as in case of RDS/OLTP
- Suitable for aggregates across all records in a single column
- Suitable for compression since all data in a particular column have data of same type
- Configuration of AWS RedShift
- Start with single node (max size 160 GB)
- You can upgrade to Multi node as your needs grow:
- Leader node: manages client connections. Front end t receive queries.
- Compute nodes: Stores data, computes queries. Up to 128 compute nodes can be deployed.
- Massively Parallel Processing (MPP) via distribution loads across many compute nodes that run parallelly
- Pricing
- No charges for leader node
- Compute nodes are charged per hour per node
- Backups are charged
- Data transfer is measured and charged
- Availability
- Available in single AZ (availability is not very important since OLAP systems are only used by few managers)
- You can take snapshots and restore to other AZs if needed