Zookeeper事务日志预分配空间解读_开发_开发者

前言

Zookeeper的通过快照日志和事务日志将内存信息保存下来，记录下来每次请求的具体信息。

尤其是其事务日志，每次处理事务请求时都需要将其记录下来。

Zookeeper事务日志的默认存储方式是磁盘文件，那么Zookeeper的总体性能就受限与磁盘文件的写入速度。

针对这个瓶颈，Zookeeper做了什么优化操作呢，本文我们就一起来了解下。

1.事务日志的预分配

事务日志的添加，我们需要从FileTxnLog.append()方法看起

public class FileTxnLog implements TxnLog {
  volatile BufferedOutputStream logStream = null;
  volatile OutputArchive oa;
  volatile FileOutputStream fos = null;
 
  // 追加事务日志
  public synchronized boolean append(TxnHeader hdr, Record txn)
    throws IOException
  {
    if (hdr == null) {
      return false;
    }

    if (hdr.getZxid() <= lastZxidSeen) {
      LOG.warn("Current zxid " + hdr.getZxid()
          + " is <= " + lastZxidSeen + " for "
          + hdr.getType());
    } 编程客栈else {
      lastZxidSeen = hdr.getZxid();
    }

    // 默认logStream为空
    if (logStream==null) {
     if(LOG.isInfoEnabled()){
        LOG.info("Creating new log file: " + Util.makeLogName(hdr.getZxid()));
     }

      // 以下代码为创建事务日志文件
      // 根据当前事务ID来创建具体文件名，并写入文件头信息
     logFileWrite = new File(logDir, Util.makeLogName(hdr.getZxid()));
     fos = new FileOutputStream(logFileWrite);
     logStream=new BufferedOutputStream(fos);
     oa = BinaryOutputArchive.getArchive(logStream);
     FileHeader fhdr = new FileHeader(TXNLOG_MAGIC,VERSION, dbId);
     fhdr.serialize(oa, "fileheader");
     // Make sure that the magic number is written before padding.
     logStream.flush();
     filePadding.setCurrentSize(fos.getChannel().position());
     streamsToFlush.add(fos);
    }
    // 预分配代码在这里
    filePadding.padFile(fos.getChannel());
    byte[] buf = Util.marshallTxnEntry(hdr, txn);
    if (buf == null || buf.length == 0) {
      throw new IOException("Faulty serialization for header " +
          "and txn");
    }
    Checksum crc = makeChecksumAlgorithm();
    crc.update(buf, 0, buf.length);
    oa.writeLong(crc.getValue(), "txnEntryCRC");
    Util.writeTxnBytes(oa, buf);

    return true;
  }
}

创建FileTxnLog对象时，其logStream属性为null，所以当第一次处理事务请求时，会先根据当前事务ID来创建一个文件。

1.1 事务日志预分配

public class FilePadding {
  long padFile(FileChannel fileChannel) throws IOException {
    // 针对新文件而言，newFileSize=64M
    long newFileSize = calculateFileSizeWithPadding(fileChannel.position(), currentSize, preAllocSize);
    if (currentSize != newFileSize) {
      // 将文件扩充到64M，全部用0来填充
      fpythonileChannel.write((ByteBuffer) fill.position(0), newFileSize - fill.remaining());
      currentSize = newFileSize;
    }
    return currentSize;
  }
 
  // size计算
  public static long calculateFileSizeWithPadding(long position, long fileSize, long preAllocSize) {
    // If preAllocSize is positive and we are within 4KB of the known end of the file calculate a new file size
    // 初始时候position=0，fileSize为0，preAllocSize为系统参数执行，默认为64M
    if (preAllocSize > 0 && position + 4096 >= fileSize) {
      // If we have written more than we have previously preallocated we need to make sure the new
      // file size is larger than what we already have
      // Q:这里确实没看懂...
      if (position > fileSize) {
        fileSize = position + preAllocSize;
        fileSize -= fileSize % preAllocSize;
      } else {
        fileSize += preAllocSize;
      }
    }

    return fileSize;
  }
}

预分配的过程比较简单，就是看下当前文件的剩余空间是否<4096，如果是，则扩容。

Q：

这里有一个不太明白的问题，position > fileSize的场景是怎样的呢？

2.创建新的事务日志文件时机

通过上述代码分析我们知道，当logStream=null时，就会创建一个新的事务日志文件，那么logStream对象什么时候为空呢？

搜索代码，只看到FileTxnLog.rollLog()方法会主动将logStream设置为null

public class FileTxnLog implements TxnLog {
  public synchronized void rollLog() throws IOException {
    if (logStream != null) {
      this.logStream.flush();
      this.logStream = null;
      oa = null;
    }
  }
}

那么根据这个线索，我们来搜索下rollLog的调用链

SyncRequestProcessor.run() -> ZKDatabase.rollLog() -> FileTxnSnapLog.rollLog() -> FileTxnLog.rollLog()

最终看到是在SyncRequestProcessor.run()方法中发起调用的，而且只有这一条调用链，我们来分析下

2.1 SyncRequestProcessor.run()

public class SyncRequestProcessor extends ZooKeeperCriticalThread implements RequestProcessor {
 public void run() {
    try {
      int logCount = 0;

      setRandRoll(r.nextInt(snapCount/2));
      while (true) {
        ...
        if (si != null) {
 http://www.devze.com         // 追加事务日志
          if (zks.getZKDatabase().append(si)) {
            logCount++;
            if (logCount > (snapCount / 2 + randRoll)) {
              setRandRoll(r.nextInt(snapCount/2));
              // 注意：在这里发起了rollLog
              zks.getZKDatabase().rollLog();
              ...
            }
          } else if (toFlush.isEmpty()) {
            ...
          }
          toFlush.add(si);
          if (toFlush.size() > 1000) {
        开发者_JAVA教程    flush(toFlush);
          }
        }
      }
    } catch (Throwable t) {
      handleException(this.getName(), t);
      running = false;
    }
    LOG.info("SyncRequestProcessor exited!");
  }
}

需要注意下rollLog()方法执行的条件，就是logCount > (snapCount / 2 + randRoll)

snapCount是一个系统参数，System.getProperty("zookeeper.snapCount")，默认值为100000

randRoll是一个随机值

那么该条件触发的时机为：处理的事务请求数至少要大于50000。

这时就出现了一个笔者无法理解的情况：

通过对事务日志的观察可以看到其都是64M，而至少处理50000次事务请求后，Zookeephttp://www.devze.comer才会分配一个新的事务日志文件，那么这个snapCount是一个经验值嘛？

如果事务请求的value信息都很大，那么可能到不了50000次编程，就会超过64M，理论上应该要创建一个新的文件了，但是貌似并没有，这个该怎么处理呢？

如果事务请求value信息都很小，那么即使到了50000次，也不会超过64M，那么之前预分配的文件大小就浪费了一部分。