Ceph ๊ธฐ๋ณธ ๋™์ž‘ ์›๋ฆฌ

2021. 2. 21. 23:16ใ†๐ŸŽฏ OpenSource/Ceph

Overview

๋ณธ ๊ธ€์€ Dynamic Data Placement with Red Hat Ceph Storage์„ ๋ณด๊ณ  ์ด๋ก ์ ์œผ๋กœ ์ •๋ฆฌํ•œ ๊ฒƒ์œผ๋กœ ์ถ”๊ฐ€์ ์ธ ๊ณต๋ถ€๊ฐ€ ํ•„์š”ํ•˜๋‹ค. ์ด ๊ธ€์˜ ๋ชจ๋“  ์‚ฌ์ง„์€ ๊ธ€์˜ ์ดํ•ด๋ฅผ ๋•๊ธฐ ์œ„ํ•ด ์œ„์˜ youtube ์˜์ƒ์—์„œ ์บก์ฒ˜ํ•œ ๊ฒƒ์ด๋‹ค.

 

Contents

Basic Architecture

    1. RADOS (reliable, autonomous, distributed object store)
      • ceph์˜ ๊ธฐ๋ฐ˜์ด ๋˜๋Š” cluster๋กœ, ์‹ค์งˆ์ ์œผ๋กœ ๋ฐ์ดํ„ฐ๊ฐ€ ์ €์žฅ๋˜๋Š” ๊ณณ์ด๋‹ค.
      • osd, monitor, manager ๋“ฑ์ด ์˜ฌ๋ผ๊ฐ„๋‹ค.
    2. LIBRADOS
      • application์ด rados์— ์ง์ ‘์ ์œผ๋กœ ์ ‘์†ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ—ˆ์šฉํ•œ๋‹ค.
      • HTTP overhead ๋ฐฉ์ง€
      • rados cluster์™€ socket ๋ฐฉ์‹์œผ๋กœ ํ†ต์‹ .
    3. RGW
      • object storage๋ฅผ ์œ„ํ•œ gateway๋กœ, S3 ๋ฐ swift์™€ ํ˜ธํ™˜๋œ๋‹ค.
      • ์•„๋ž˜์˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด, S3 ์š”์ฒญ์ด ๋“ค์–ด์˜ค๋ฉด radosgateway๋Š” librados์—๊ฒŒ ์ „๋‹ฌํ•œ๋‹ค.
    4. RBD (Rados Block Device)
        • ๋ฐ์ดํ„ฐ ๋ถ„์‹ค์„ ์ตœ์†Œํ™”ํ•˜๋ฉฐ ๋ถ„์‚ฐ๋œ block storage ๊ตฌํ˜„
        • ์œ„์˜ ์‚ฌ์ง„์˜ ๊ฒฝ์šฐ, openstack์œผ๋กœ ๊ตฌ์ถ•๋œ ํด๋ผ์šฐ๋“œ ํ™˜๊ฒฝ์—์„œ vm์—์„œ block storage์— access์š”์ฒญ์ด ์ผ์–ด๋‚˜๋ฉด
          hypervisor์—์„œ ํ•ด๋‹น request๋ฅผ librbd๋กœ ์ „๋‹ฌํ•œ๋‹ค. 
        • Openstack ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋‹ค๋ฅธ ์˜คํ”ˆ์†Œ์Šค๋“ค๊ณผ๋„ ํ•จ๊ป˜ ์“ธ ์ˆ˜ ์žˆ๋‹ค.
    5. CEPHFS
      • ceph file storage ์ œ๊ณต

 

OSD (Object Storage Daemon)

  1. ํ•˜๋‚˜์˜ osd๋Š” ํ•˜๋‚˜์˜ disk์— ์˜ฌ๋ผ๊ฐ„๋‹ค.
  2. Cluster ๋‚ด์— 10s~10000s ๊ฐœ์˜ osd๋ฅผ ์˜ฌ๋ฆด ์ˆ˜ ์žˆ์œผ๋‚˜ ์ตœ์†Œ 100๊ฐœ๋Š” ์˜ฌ๋ฆฌ๋Š” ๊ฒƒ์ด ์„ฑ๋Šฅ์ƒ ์ข‹๋‹ค.
  3. ์ €์žฅ์†Œ์— object ์ €์žฅ์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.
  4. peer for replication, recovery, / rebalancing

 

Monitor

  1. cluster์™€ ๊ฐ์ข… map์˜ ์ƒํƒœ๋ฅผ ์œ ์ง€ํ•˜๋Š” ์—ญํ• ์„ ํ•œ๋‹ค.
  2. ๋ถ„์‚ฐ ์‹œ์Šคํ…œ์ธ ๋งŒํผ osd์˜ ์ƒํƒœ(์ •์ƒ ๋™์ž‘ ์ค‘์ธ์ง€, ์–ด๋–ค disk์™€ matching ์ค‘์ด๊ณ  disk ์ƒํƒœ๋Š” ์–ด๋–ค์ง€) ๋“ฑ์— ๋Œ€ํ•ด ์•Œ์•„์•ผ ํ•œ๋‹ค.
  3. monitor๋Š” ์œ„์˜ ์‚ฌํ•ญ๋“ค์„ ์ฒดํฌํ•˜์—ฌ ์š”์ฒญ์„ ์–ด๋–ป๊ฒŒ ๋ณด๋‚ผ์ง€ ํŒ๋‹จํ•œ๋‹ค.
  4. ํ™€์ˆ˜์˜ monitor๋ฅผ ๋‘์–ด ์ฟผ๋Ÿผ์„ ํ•˜๋„๋ก ํ•œ๋‹ค.
  5. ๋ฐ์ดํ„ฐ ์ž์ฒด๋ฅผ ๋‹ค๋ฃจ์ง€๋Š” ์•Š์Œ.

 

Manager

  1. luminous ๋ฒ„์ „ ๋ถ€ํ„ฐ ๋“ฑ์žฅํ•œ ๋ฐ๋ชฌ
  2. external monitoring๊ณผ management๋ฅผ ์œ„ํ•ด ์ถ”๊ฐ€์ ์ธ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์ œ๊ณตํ•œ๋‹ค.
  3. manager framework interface๋ฅผ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ๋ชจ๋“ˆ๊ณผ์˜ ํ˜ธํ™˜์„ฑ์„ ๋†’์ผ ์ˆ˜ ์žˆ๋‹ค.
  4. ๋ชจ๋“ˆ ์˜ˆ์‹œ

     

Object Placement with crush

  1. CRUSH : Controlled Replication Under Scalable Hashing
    • client์ธก์—์„œ ํ•ด๋‹น object๋ฅผ ์ด์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์–ด๋–ค osd์— ์ ‘๊ทผํ• ์ง€ ์•Œ์•„์•ผ ํ•œ๋‹ค.
    • ์ด์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์ด crush ์•Œ๊ณ ๋ฆฌ์ฆ˜
    • ์ ‘๊ทผ ๋ฐฉ์‹ (object storage, block storage, ceph file system๋“ฑ)์— ์ƒ๊ด€์—†์ด crush ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ ๋™์ผํ•˜๋‹ค
  2. PGs
    • 100๊ฐœ๊ฐ€ ๋„˜๋Š” osd์— ๊ฑธ์ณ ๋ณต์ œ๋˜์–ด ์žˆ๋Š” ์ˆ˜๋งŽ์€ object๋ฅผ ๊ด€๋ฆฌํ•˜๋Š”๊ฒƒ์€ ํž˜๋“ค๋‹ค. ๋”ฐ๋ผ์„œ ๋ชจ๋“  object๋ฅผ ๊ฐœ๋ณ„์ ์œผ๋กœ ๊ด€๋ฆฌํ•˜๋Š” ๋Œ€์‹ ์—, ์ด๋ฅผ PG(placement group)์œผ๋กœ ์†์‰ฝ๊ฒŒ ๋‹ค๋ฃจ๋„๋ก ํ•œ๋‹ค.
      • object๊ฐ€ ์ €์žฅ๋œ osd๋ฅผ ํƒ์ƒ‰ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” pool์„ placement group์ด๋ผ๋Š” sub device๋กœ ๋‚˜๋ˆ„๊ฒŒ ๋œ๋‹ค.
      • object name hash % number of pgs in the pool
      • ์ด๋•Œ์˜ pool : cluster๋ฅผ ๋…ผ๋ฆฌ์ ์œผ๋กœ ๋‚˜๋ˆˆ ํŒŒํ‹ฐ์…˜
      • โœ… A Placement Group (PG) is a logical collection of objects that are replicated on OSDs to provide reliability in a storage system. Depending on the replication level of a Ceph pool, each PG is replicated and distributed on more than one OSD of a Ceph cluster. You can consider a PG as a logical container holding multiple objects, such that this logical container is mapped to multiple OSDs

 

Erasure Coding

๋ฐ์ดํ„ฐ๋ฅผ ์•ˆ์ „ํ•˜๊ฒŒ ์ €์žฅํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ์‹์œผ๋กœ ๋‘๊ฐ€์ง€๊ฐ€ ์žˆ๋‹ค.

  1. replicated ๋ฐฉ์‹
    • default : ํ•ด๋‹น pool์—์„œ ๊ฐ๊ฐ์˜ pg๋ฅผ replicated 3ํ•˜๋Š” ๊ฒƒ.
  2. erasure coded ๋ฐฉ์‹
    • ์ €์žฅ ๊ณต๊ฐ„์„ ์ ˆ์•ฝํ•  ์ˆ˜ ์žˆ๋‹ค.
    • ํ•˜์ง€๋งŒ, ํ•˜๋‚˜์˜ osd๋ฅผ ์žƒ์œผ๋ฉด ๋‹ค๋ฅธ osd๋“ค์„ ๋ชจ๋‘ ์ฝ์–ด์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณต๊ตฌ์‹œ์ผœ์•ผ ํ•œ๋‹ค.
      • cluster recovery๊ฐ€ ๋” ์–ด๋ ค์›Œ์ง.

 

์ฐธ๊ณ ์ž๋ฃŒ

Ceph CRUSH Map, Bucket ์•Œ๊ณ ๋ฆฌ์ฆ˜

Ceph

Dynamic Data Placement with Red Hat Ceph Storage