leveldb代码精读插入操作-白红宇

leveldb代码精读插入操作

阅读量：2434 次

发布时间：2019-05-10

本文共 6075 字，大约阅读时间需要 20 分钟。

leveldb插入数据时，必然做的操作是先写logfile，再将数据放到cache里

不过在此之前，会先进行一下预处理

1 将要写的数据封装到到writer里，将write加入写队列，等待轮到它写。

2 检查cache是否已满，是否需要“做检查点”

3 leveldb的cache有两个状态，当前状态和只读状态。

当cache写满，需要写文件时，会将cache转成只读状态，进行写文件和文件压缩操作。

所以每次写文件前都要先等待之前的只读cache完成自己的使命。

4 由于新数据需要些到level 0文件，而level 0文件的个数是有限制的

当达到soft limit时，需要sleep1毫秒，将cpu资源让给正在进行中的压缩操作。

当达到hard limits时，直接进入等待。

5 维护cache状态，维护file number，创建新文件。。。

6 尝试进行一次文件压缩。

对外的put函数

点击(此处)折叠或打开

Status DBImpl::Put(const WriteOptions& o, const Slice& key, const Slice& val) {

return DB::Put(o, key, val);

}

Status DB::Put(const WriteOptions& opt, const Slice& key, const Slice& value) {

WriteBatch batch;

batch.Put(key, value);

return Write(opt, &batch);

}

真正的功能入口函数

点击(此处)折叠或打开

Status DBImpl::Write(const WriteOptions& options, WriteBatch* my_batch) {

// 将WriteBatch封装到一个Writer里，设置一些选项。

Writer w(&mutex_);

w.batch = my_batch;

w.sync = options.sync;

w.done = false;

MutexLock l(&mutex_);

把Writer放到到写队列里，等待writer升到 writers_.front 就能开始了。

writers_.push_back(&w);

while (!w.done && &w != writers_.front()) {

w.cv.Wait();

}

if (w.done) {

return w.status;

}

// May temporarily unlock and wait.

MakeRoomForWrite会对写入操作所需的条件进行一系列判断，

如：

level 0 文件数是否超过限制，

cache是否还有空间，是否真的需要写文件，

imm cache是否能够释放

是否需要做压缩

条件都满足，确定需要写文件时，进行:

生成新的文件号，新的logfile，将当前cache转成imm cache等操作

参数为my_batch == NULL的意思是：如果my_batch为空，则视为想让MakeRoomForWrite尝试做一次压缩。

Status status = MakeRoomForWrite(my_batch == NULL);

// sequence是指写batch的次数

uint64_t last_sequence = versions_->LastSequence();

Writer* last_writer = &w;

my_batch是有可能为空的，可以利用空batch手动让MakeRoomForWrite进行压缩操作。

if (status.ok() && my_batch != NULL) {
// NULL batch is for compactions

WriteBatch里有一个字符串rep_，存放转码成存储格式后的数据。

BuildBatchGroup的工作是从writers_里找其它的WriteBatch，他们的rep_拼到一个WriteBatch里

但是最终的rep_长度不能超过 1 << 20

WriteBatch* updates = BuildBatchGroup(&last_writer);

// WriteBatchInternal是一个由静态函数组成的工具类

WriteBatchInternal::SetSequence(updates, last_sequence + 1);

last_sequence += WriteBatchInternal::Count(updates);

// Add to log and apply to memtable. We can release the lock

// during this phase since &w is currently responsible for logging

// and protects against concurrent loggers and concurrent writes

// into mem_.

{

mutex_.Unlock();

// 写logfile

status = log_->AddRecord(WriteBatchInternal::Contents(updates));

bool sync_error = false;

if (status.ok() && options.sync) {

status = logfile_->Sync();

if (!status.ok()) {

sync_error = true;

}

}

if (status.ok()) {

// 将数据放入cache

status = WriteBatchInternal::InsertInto(updates, mem_);

}

mutex_.Lock();

if (sync_error) {

// The state of the log file is indeterminate: the log record we

// just added may or may not show up when the DB is re-opened.

// So we force the DB into a mode where all future writes fail.

RecordBackgroundError(status);

}

}

if (updates == tmp_batch_) tmp_batch_->Clear();

// 更新sequence

versions_->SetLastSequence(last_sequence);

}

while (true) {

Writer* ready = writers_.front();

writers_.pop_front();

if (ready != &w) {

ready->status = status;

ready->done = true;

ready->cv.Signal();

}

if (ready == last_writer) break;

}

// Notify new head of write queue

if (!writers_.empty()) {

writers_.front()->cv.Signal();

}

return status;

}

之所以需要MakeRoom是因为新数据需要写入level 0 数据文件，但是level 0文件数量有限制。

可能需要做压缩来减少level 0 文件的数量。

同时当前cache也需要转成imm cache，需要判断之前的imm cache是否还占着位置。

点击(此处)折叠或打开

Status DBImpl::MakeRoomForWrite(bool force) {

mutex_.AssertHeld();

assert(!writers_.empty());

// 决定是否允许通过sleep来给压缩操作让出系统资源。

bool allow_delay = !force;

Status s;

while (true) {

if (!bg_error_.ok()) {

// Yield previous error

s = bg_error_;

break;

} else if (

当允许delay，并且level 0的文件数已经超过了8个，就要sleep 1毫秒，给复制压缩的线程工作让出CPU资源。

sleep一次后就将allow_delay设成false，这次写入操作就不需要再sleep了。

allow_delay &&

versions_->NumLevelFiles(0) >= config::kL0_SlowdownWritesTrigger) {

// We are getting close to hitting a hard limit on the number of

// L0 files. Rather than delaying a single write by several

// seconds when we hit the hard limit, start delaying each

// individual write by 1ms to reduce latency variance. Also,

// this delay hands over some CPU to the compaction thread in

// case it is sharing the same core as the writer.

mutex_.Unlock();

env_->SleepForMicroseconds(1000);

allow_delay = false; // Do not delay a single write more than once

mutex_.Lock();

} else if (!force &&

当cache不满时，先不写文件。

(mem_->ApproximateMemoryUsage() <= options_.write_buffer_size)) {

// There is room in current memtable

break;

} else if (imm_ != NULL) {

leveldb有两种cache，一个是当前cache，就是目前正在写新数据的cache。

当cache满了，需要写文件时，就将当前cache转成immunity cache，是一个只读cache，由指针imm_管理。

imm cache 用户查询操作和压缩操作。

如果imm cache存在，就要等它的对应的文件压缩完成才能将当前cache转成imm cache。

// We have filled up the current memtable, but the previous

// one is still being compacted, so we wait.

Log(options_.info_log, "Current memtable full; waiting...\n");

bg_cv_.Wait();

} else if (versions_->NumLevelFiles(0) >= config::kL0_StopWritesTrigger) {

// 达到了level 0 文件数的硬指标限制，不能再写新的了。

// There are too many level-0 files.

Log(options_.info_log, "Too many L0 files; waiting...\n");

bg_cv_.Wait();

} else {

// 检查条件结束，开始正式工作

// Attempt to switch to a new memtable and trigger compaction of old

assert(versions_->PrevLogNumber() == 0);

// 生成新的logfile number

uint64_t new_log_number = versions_->NewFileNumber();

WritableFile* lfile = NULL;

// 创建新文件

s = env_->NewWritableFile(LogFileName(dbname_, new_log_number), &lfile);

if (!s.ok()) {

// Avoid chewing through file number space in a tight loop.

versions_->ReuseFileNumber(new_log_number);

break;

}

delete log_;

delete logfile_;

// 将Logfile指向新文件，设置新log number

logfile_ = lfile;

logfile_number_ = new_log_number;

log_ = new log::Writer(lfile);

// 将当前cache切换成imm cache，创建新的当前cache

imm_ = mem_;

has_imm_.Release_Store(imm_);

mem_ = new MemTable(internal_comparator_);

mem_->Ref();

force = false; // Do not force another compaction if have room

如果需要，进行一次压缩

这里面进行了一下判断，调了回调函数

最终真正的功能入口是DBImpl::BackgroundCompaction()

MaybeScheduleCompaction();

}

}

return s;

}

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/26239116/viewspace-1847246/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/26239116/viewspace-1847246/

你可能感兴趣的文章

笔记︱风控分类模型种类（决策、排序）比较与模型评估体系（ROC/gini/KS/lift）

查看>>

MySQL存储引擎之MyISAM与InnoDB区别

SpringCloud全家桶---Zuul网关

关于MySQL wait_timeout问题记录

查看>>

基础算法面试题---如何用栈实现队列

查看>>

基础算法面试题---如何用队列实现栈（1）

查看>>

基础算法面试题---如何用队列实现栈（2）

查看>>

基础算法面试题---如何数组实现栈和队列

查看>>