生产HBase集群诡异的写入非常慢

科技 09-16 来源：若泽大数据

1.背景

KSSH程序监控显示HBase写入异常的诡异的慢，导致batch堆积。

2.错误

2.1Spark job log：

22/07/27 01:01:37 INFO Metrics: Initializing metrics system: phoenix
22/07/27 01:01:37 INFO MetricsConfig: loaded properties from hadoop-metrics2.properties
22/07/27 01:01:37 INFO MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
22/07/27 01:01:37 INFO MetricsSystemImpl: phoenix metrics system started
22/07/27 01:01:37 INFO deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
22/07/27 01:01:38 INFO CachedKafkaConsumer: Initial fetch for spark-executor-kssh_v1 kssh 1 284382547
22/07/27 01:01:38 INFO AbstractCoordinator: Discovered coordinator hadoop38:9092 (id: 2147483459 rack: null) for group spark-executor-kssh_v1.
22/07/27 01:01:40 INFO deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
22/07/27 01:01:40 INFO deprecation: dfs.socket.timeout is deprecated. Instead, use dfs.client.socket-timeout
22/07/27 01:01:40 INFO deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
22/07/27 01:01:40 INFO deprecation: dfs.socket.timeout is deprecated. Instead, use dfs.client.socket-timeout
22/07/27 01:01:40 INFO DefaultMetricsCollector: Configured metrics report to emit every 60 seconds
22/07/27 01:02:18 INFO AsyncProcess: #1, waiting for 2 actions to finish on table: JYDW:OMS_ORDERINFOITEM
22/07/27 01:02:18 INFO AsyncProcess: Left over 2 task(s) are processed on server(s): [hadoop49,60020,1550503850620]
22/07/27 01:02:18 INFO AsyncProcess: Regions against which left over task(s) are processed: [JYDW:OMS_ORDERINFOITEM,,1509703183405.f4a8235d947d1fa0f358dd1c789f2d97.]

2.2hadoop56 RS log:

2022-07-27 13:33:29,964 INFO SecurityLogger.org.apache.hadoop.hbase.Server: Auth successful for zhonggang (auth:SIMPLE)
2022-07-27 13:33:29,964 INFO SecurityLogger.org.apache.hadoop.hbase.Server: Connection from 192.168.6.75 port: 50808 with version info: version: "1.2.0-cdh5.12.0" url: "file:///data/jenkins/workspace/generic-binary-tarball-and-maven-deploy/CDH5.12.0-Packaging-HBase-2017-06-29_04-13-35/hbase-1.2.0-cdh5.12.0" revision: "Unknown" user: "jenkins" date: "Thu Jun 29 04:37:42 PDT 2017" src_checksum: "6834049453a9459ccaf4cadbf9a54b2c"
2022-07-27 13:35:46,847 INFO org.apache.hadoop.hbase.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 3167ms
GC pool 'G1 Young Generation' had collection(s): count=2 time=3197ms
GC pool 'G1 Old Generation' had collection(s): count=1 time=8012ms
2022-07-27 13:36:12,361 INFO org.apache.hadoop.hbase.io.hfile.LruBlockCache: totalSize=6.01 MB, freeSize=5.59 GB, max=5.60 GB, blockCount=1, accesses=130, hits=130, hitRatio=100.00%, , cachingAccesses=129, cachingHits=129, cachingHitsRatio=100.00%, evictions=176, evicted=2, evictedPerRun=0.011363636702299118
2022-07-27 13:36:13,312 INFO org.apache.hadoop.hbase.io.hfile.bucket.BucketCache: failedBlockAdditions=0, totalSize=2.00 GB, freeSize=2.00 GB, usedSize=55 KB, cacheSize=38.78 KB, accesses=5168, hits=0, IOhitsPerSecond=0, IOTimePerHit=NaN, hitRatio=0,cachingAccesses=10, cachingHits=0, cachingHitsRatio=0,evictions=0, evicted=0, evictedPerRun=NaN
2022-07-27 13:36:42,457 INFO SecurityLogger.org.apache.hadoop.hbase.Server: Auth successful fo

3.尝试调优解决:

停止KSSH程序写入HBase、通知其他部门及同事暂且不使用HBase，也就是没有读写请求。

3.1 重启HBase集群，情况依旧；

3.2 重启后发现RIT，fsck命令修复，情况依旧；

$ hbase hbck --help

Usage: fsck [opts] {only tables}
 where [opts] are:
   -help Display help options (this)
   -details Display full report of all regions.
   -timelag   Process only regions that  have not experienced any metadata updates in the last   seconds.
   -sleepBeforeRerun  Sleep this many seconds before checking if the fix worked if run with -fix
   -summary Print only summary of the tables and status.
   -metaonly Only check the state of the hbase:meta table.
   -sidelineDir  HDFS path to backup existing meta.
   -boundaries Verify that regions boundaries are the same between META and store files.
   -exclusive Abort if another hbck is exclusive or fixing.
  Metadata Repair options: (expert features, use with caution!)
   -fix              Try to fix region assignments.  This is for backwards compatiblity
   -fixAssignments   Try to fix region assignments.  Replaces the old -fix
   -fixMeta          Try to fix meta problems.  This assumes HDFS region info is good.
   -noHdfsChecking   Don't load/check region info from HDFS. Assumes hbase:meta region info is good. Won't check/fix any HDFS issue, e.g. hole, orphan, or overlap
   -fixHdfsHoles     Try to fix region holes in hdfs.
   -fixHdfsOrphans   Try to fix region dirs with no .regioninfo file in hdfs
   -fixTableOrphans  Try to fix table dirs with no .tableinfo file in hdfs (online mode only)
   -fixHdfsOverlaps  Try to fix region overlaps in hdfs.
   -fixVersionFile   Try to fix missing hbase.version file in hdfs.
   -maxMerge      When fixing region overlaps, allow at most  regions to merge. (n=5 by default)
   -sidelineBigOverlaps  When fixing region overlaps, allow to sideline big overlaps
   -maxOverlapsToSideline   When fixing region overlaps, allow at most  regions to sideline per group. (n=2 by default)
   -fixSplitParents  Try to force offline split parents to be online.
   -removeParents    Try to offline and sideline lingering parents and keep daughter regions.
   -ignorePreCheckPermission  ignore filesystem permission pre-check
   -fixReferenceFiles  Try to offline lingering reference store files
   -fixEmptyMetaCells  Try to fix hbase:meta entries not referencing any region (empty REGIONINFO_QUALIFIER rows)
  Datafile Repair options: (expert features, use with caution!)
   -checkCorruptHFiles     Check all Hfiles by opening them to make sure they are valid
   -sidelineCorruptHFiles  Quarantine corrupted HFiles.  implies -checkCorruptHFiles
  Metadata Repair shortcuts
   -repair           Shortcut for -fixAssignments -fixMeta -fixHdfsHoles -fixHdfsOrphans -fixHdfsOverlaps -fixVersionFile -sidelineBigOverlaps -fixReferenceFiles -fixTableLocks -fixOrphanedTableZnodes
   -repairHoles      Shortcut for -fixAssignments -fixMeta -fixHdfsHoles
  Table lock options
   -fixTableLocks    Deletes table locks held for a long time (hbase.table.lock.expire.ms, 10min by default)
  Table Znode options
   -fixOrphanedTableZnodes    Set table state in ZNode to disabled if table does not exists
 Replication options
   -fixReplication   Deletes replication queues for removed peers


$ hbase hbck /

ERROR: Region { meta => TESTDW:PUB_DELIVERREGIONRULE_IDX,,1548815156590.31cb7b9e86574a9ec05db1b2fe5916f6., hdfs => hdfs://nameservice1/hbase/data/TESTDW/PUB_DELIVERREGIONRULE_IDX/31cb7b9e86574a9ec05db1b2fe5916f6, deployed => , replicaId => 0 } not deployed on any region server.
.........
.........
22/07/27 16:39:14 INFO util.HBaseFsck: Handling overlap merges in parallel. set hbasefsck.overlap.merge.parallel to false to run serially.
ERROR: There is a hole in the region chain between  and . You need to create a new .regioninfo and region dir in hdfs to plug the hole.
ERROR: Found inconsistency in table TESTDW:PUB_DELIVERREGIONRULE_IDX

$ hbase hbck -repair

3.3 调优memstore，情况依旧；

rs heap memory：22G

3.3.1 Memstore级别限制：

当Region中任意一个MemStore的大小达到了上限（hbase.hregion.memstore.flush.size，256MB），会触发Memstore刷新。

3.3.2 Region级别限制：

当Region中所有Memstore的大小总和达到了上限（hbase.hregion.memstore.block.multiplier * hbase.hregion.memstore.flush.size，默认 3* 256M = 768M），会触发memstore刷新。

3.3.3 Region Server级别限制：

当一个Region Server中所有Memstore的大小总和达到了上限（hbase.regionserver.global.memstore.upperLimit ＊ hbaseheapsize=0.45*22G=9.9G，为45%的JVM内存使用量），会触发部分Memstore刷新。Flush顺序是按照Memstore由大到小执行，先Flush Memstore最大的Region，再执行次大的，直至总体Memstore内存使用量低于阈值。Apache and CDH5.8.0之前:hbase.regionserver.global.memstore.lowerLimit ＊ hbaseheapsize，默认 38%的JVM内存使用量）。CDH5.8.0之后: hbase.regionserver.global.memstore.lowerLimit=0.92 memstore memory的92%就开始flush。

3.3.4 当一个Region Server中HLog数量达到上限（可通过参数hbase.regionserver.maxlogs=32配置）时，系统会选取最早的一个 HLog对应的一个或多个Region进行flush。

3.3.5 HBase定期刷新Memstore：

默认周期为1小时，确保Memstore不会长时间没有持久化。为避免所有的MemStore在同一时间都进行flush导致的问题，定期的flush操作有20000左右的随机延时。

3.3.6 手动执行flush：用户可以通过shell命令 flush ‘tablename’或者flush ‘region name’分别对一个表或者一个Region进行flush。

http://hbasefly.com/2016/03/23/hbase-memstore-flush/

3.4 清空zk，重启HBase，情况依旧

[root@hadoop38 bin]# ./zkCli.sh

[zk: localhost:2181(CONNECTED) 0] ls /

[zookeeper, yarn-leader-election, hadoop-ha, rmstore, kafka, hbase]
[zk: localhost:2181(CONNECTED) 1] rmr /hbase

3.5 关闭HBase，修复meta表，启动HBase，情况依旧

$ hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair

3.6 搜索apache issue for hbase没有发现太多信息

3.7 当前HBase-1.2.0-CDH5.12.0，去CLOUDERA官网查看新版本的CDH5.12.1~5.16.1的HBase的issue fixed 很多，

心里想想最后最后再升级吧，其实自己心里也不确定是否能够提供升级解决！

CDH5.12.1-5.16.1-changes.log

4.查看gc，调优jvm

4.1 部署:

vi /usr/java/jdk1.8.0_45/jstatd.all.policy
grant codebase "file:${java.home}/../lib/tools.jar" {
permission java.security.AllPermission; 
}; 
nohup jstatd -J-Djava.security.policy=/usr/java/jdk1.8.0_45/jstatd.all.policy &

4.2 Master JAVA OPTS:

-Dcom.sun.management.jmxremote.port=8998
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false

4.3 RS JAVA OPTS，重启HBase生效:

-XX:+UseG1GC 
-XX:InitiatingHeapOccupancyPercent=60 
-XX:-ResizePLAB 
-XX:MaxGCPauseMillis=200 
-XX:+UnlockDiagnosticVMOptions 
-XX:+G1SummarizeConcMark 
-XX:+ParallelRefProcEnabled 
-XX:G1HeapRegionSize=32m 
-XX:G1HeapWastePercent=20 
-XX:ConcGCThreads=8 
-XX:ParallelGCThreads=16 
-XX:MaxTenuringThreshold=15 
-XX:G1MixedGCCountTarget=64 
-XX:+UnlockExperimentalVMOptions 
-XX:G1NewSizePercent=3 
-XX:G1OldCSetRegionThresholdPercent=5
-Dcom.sun.management.jmxremote.port=8999 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false 
-XX:MetaspaceSize=200M

4.4 VisualVM工具连接hadoop55的rs进程。

4.4.1 调优之前，full gc频繁:

4.4.2 CPU 超过100%，负载很高:

4.4.3 调优之后，情况稍微改观，但还是写入很慢很慢:

5.dump rs进程

"IPC Client (1179689991) connection to hadoop36/192.168.17.36:60000 from hbase" - Thread t@37484
java.lang.Thread.State: TIMED_WAITING
at java.lang.Object.wait(Native Method)
- waiting on <3ea53ce0> (a org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.waitForWork(RpcClientImpl.java:551)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.run(RpcClientImpl.java:566)

Locked ownable synchronizers:
- None

 

"RS_OPEN_REGION-hadoop56:60020-87" - Thread t@1201

java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <7c36101b> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Locked ownable synchronizers:
- None

rs dump.log 查看dump文件分析，发现大量线程 WAITING