[CA5] Large and Fast; Exploiting Memory Hierarchy

November 8, 2023 5 분 소요

Introduction
DRAM Organization & Operation
Memory Hierarchy: Principle of Locality
Caches
Cache Block Size
Main Memory & Caches
Write Policies
Cache Performance
Associative Caches
Multilevel Caches
Cache’s Interaction with Advanced CPUs and Software

Introduction

memory hierarchy : A structure that uses multiple levels of memories. ; as the distance from the processor increases, the size of the memories and the access time both increase while the cost per bit decreases.

Alt text

cf. *Why* is **memory hierarchy** crucial? ![Alt text](/images/ca5_processor-memory-improvement.png) Even if the processor speed and memory latency are getting more and more improved, it still takes around 100+ CPU cycles for a DRAM access. The hierarial structure is the key to improve this.

DRAM Organization & Operation

DRAM configuration

strobe: A signal used to time reading that is output together with the data.
Block(line): Unit of data presented in caches.

Note: CPU register ‘line’ == word (4B)

Large capacity
- Arranged as 2D matrix
Narrow data interface: 1-16 bits (x1, x4, x8, x16)
- Cheap packages => Few bus pins.
- Pins are expensive.
Narrow address interface
- Multiplexed address strobe lines: row and column address
- Signaled by RAS# and CAS#, respectively. (RAS: Row Address Strobe, CAS: Column Address Strobe)

Alt text: Grid-like figure to represent how the DRAM is made up.

DRAM Organization

DRAM: set of banks
bank: independent memory request processing.
- # banks == parallelism degree.
Row buffer: bank-private cache area. (=> faster than DRAMs)
- 4-16kB in size
- Shared by all CPU cores
(cf. Rank = 4-16 banks)

Alt text

“각 bank에 동시에 access가 가능하면, spreading sequential data @ multiple banks is better.”

Alt text

DRAM Operations

(In? Column-first scheduling)

1) row-hit access 2) row-miss access

DRAM access scenario

Three column reads to row 0, one col read to row 1.

Alt text

Access scheduling sequence

#	RAS#?	CAS#?	Result
1	0	0	row-miss
2	0	1	row-hit
3	1	2	row-miss
4	1	2	row-miss

Explanation

1. Row miss: ACT, row activation
  - We should have one miss at the initial data accesses.
  - Right after the miss, the corresponding data portion is filled into the buffer.
  - Here, we accessed the data of row 1 of the DRAM data for the first time.
1. Row hit: We already had an ACT in access to row0 col0. Thus, the row buffer stores the Row0, which then we can use to access it later.
1. Row miss: ACT, because we haven’t loaded the Row1’s data into the buffer.
  - Then we fill ? Buffer
1. Row miss: “ACT”. Why?? (probably because the tag bit differs!)

In case of row-miss, the row buffer gets the data from the DRAM.

Alt text

Memory Hierarchy: Principle of Locality

(If a data location is referenced,)

terms:

temporal locality

The principle stating that if a location is referenced then it will tend to be referenced again soon.
spatial locality

The locality principle stating that if a data location is

Temporal Locality

Recently accessed items are likely to be accessed again soon.

e.g. looping variables (induction variables), instructions in a loop

Spatial Locality

Items near recently accessed one are likely to be referenced/accessed soon.

e.g. sequential instruction accesses (e.g. 4 bytes array), array data

Taking advantage of this,

Store everything on disk
Copy recently accessed (and nearby - spatial locality) items from disk to smaller DRAM memory
1. main memory
Copy more recently accessed (and nearby) items from DRAM to smaller SRAM memory (- temporal locality?)
1. Cache memory attached to CPU

memory hierarchy

Access Hits and Misses

Block(line): Unit of data presented in caches.
Hit: If accessed data is present in upper level. (Memory accesses found in a level of the memory hierarchy.)
- Hit ratio: hits/accesss.
Miss: If accessed data is absent in the upper level. (Memory accesses not found in a level of the memory hierarchy.)
- => Copy the block from the lower level
  - (=> Regenerate the access by copying it to the upper level.(?))
- Time taken: miss penalty: access to lower + access to upper
- Miss ratio: misses/accesses
  - = 1 - hit ratio
Then accessed data supplied from upper level

Caches

Cache memory: The level of the memory hierarchy closest to the CPU (except for the register(?))

Given accesses, we have 2 Questions:

Where do we look?
How do we know the presense of the data?

Cache organization

memory hierarchy

스스로 정리한 내용 + Tag array + Data Array

Address composition (of fields)
- tag
- (cache) index
- block offset
- word-byte offset.

Jaehyun Bhang

[CA5] Large and Fast; Exploiting Memory Hierarchy

Introduction

DRAM Organization & Operation

DRAM configuration

DRAM Organization

DRAM Operations

Memory Hierarchy: Principle of Locality

Temporal Locality

Spatial Locality

Caches

Cache organization

5.2. Memory Technologies

5.3. The Basics of Caches

5.4. Measuring and Improving Cache Performance

5.5. Dependable Memory Hierarchy

공유하기

참고

My PS Template

2023년 2학기 회고록

[NS24] Molecular Mechanisms of Learning and Memory

[CA6] Parallel Processors from Client to Cloud