재미있었다!!!! consistent hashing 작년쯤에 듣고 재밌다 싶었는데, 해쉬 여러 개 두는 아이디어 완전 짱 브릴리언트...
it's scalable, but not fast => you're doing something wrong
- past 10 years, database problems got worse
- because btree's do not in memory anymore - or they get pushed out of the memory; when we see this we are hosed
- web apps are taking off and the number of potential users are much greater than what RMDBs are designed against
- https are stateless thus scalable, but datas are stateful. database servers are not easy to scale.
- scalability: two kinds of components
- reads and writes
- very different performance characteristics
- reads are easier to scale
band-aids
- not actually scales, but does help
- band aid 1: btrees suck with rotational media -- in rot media sequential read/writes are much better -- btrees require random access: we can throw SSD at it!!!
- moore's law; 37signal's approach
- caches; memcached; ehcache
- partitioning (read) cache machines; we cannot randomly distribute them; so use hashing for that.
- what do we do with changing number of servers?
- solve by consistent hashing; use libketema
- we can even go further, and let servers have multiple tokens in the ring (wow... smart stuff)
- cold start problem; cold start => users reload => we are in hell
- partitioning (read) cache machines; we cannot randomly distribute them; so use hashing for that.
- cache coherence
- write-through; will cause a race condition. A writes a new value to the cache, cache dies, comes back up, B reads, fails, reads from database and pushes it into cache; if they come in wrong order, old value can be stuck in the database
- explicit invalidation; write to database, invalidate cache entry
- cache set invalidation; when a row maps to multiple keys in memcached, what do we do???
- generate a random prefix, effectively GUID
- write-back caching
- every 5 minutes, only reflect the newest value into cache
- only when we can endure losing data
- memcached can't do it, because iteration is impossible: teracotta, zookeeper
scalablilty
- replication; we can scale read by having multiple copies of database
- two topologies; master-slave and master-master
- master-slave; all writes go into masters first
- master-master; every server can accept writes and reconciles them
- two synch ways: sync/async?
- can mix in four ways;
- sync-master-slave
- sync-multi-master; slower (all masters should recognize writes), PGCluster, Oracle, latency is high, complex
- async-master-slave; easiest. MySQL replication, MongoDB, Redis, etc etc
- async-multi-master; we reconcile writes after writes. conflict resolution is quadratic or even cubic (research.microsoft.com/~gray/replicas.pdf) as you add nodes. Tokyo Cabinet does this. MySQL cluster does this.
- asyncs can lose data if the master fails
- replication does not scale writes
- two topologies; master-slave and master-master
- partitioning
- sharding -- horizontal, key based
- functional partitioning -- vertical
- horizontal: each piece of the database has different set of rows
- is actually hard; blogger.com -- users, blogs, comments. what I shard according to user keys? blogs and comments made to the blogs must reside on the same machine; but we can't show all comments made by a specific user
- central db that server owns a key; single point of failure
- mongoDB uses this
- vertical partitioning: tables on separate nodes
- we can do both, actually
- all of these are HARD!!!
NoSQL
- Distributed; Cassandra, HBase, Voldemort, (MongoDB?)
- Not distributed; Neo4j, CouchDB, Redis
- CouchDB distributeness is not real distributedness
- Two famous papers; bigtable, and dynamo
- bigtable; how can we build a database on top of GFS?
- dynamo: how can we build a distributed hash table for data center?
- Cassandra (open space 5:30pm tmr)
- I'll go ask about our use cases




그거라면 CRUD 트랜잭션 성능을 높이는 데는 쓸모가 있을지 몰라도 운영부하가 지옥임.
어느 정도의 규모를 가진 서비스 대상인지는 몰라도 (초대형이면 그냥 후반부 분산DB)
그렇지 않으면 Scale-Up하고 메모리 많이 박고 캐시 많이 때려박은 컨트롤러 장착한 적절한 머신이 차라리 나음..