当前实际应用中的一致性模型分类大体分为弱一致性和强一致性两类。
也叫最终一致性
大体可分为两类:
1.主从同步
Master接受写请求
Master复制日志至slave
Master等待,直到所有从库返回
2.多数派
Paxos
Raft(multi-paxos)
ZAB(multi-paxos)
Paxos算法是莱斯利·兰伯特(Leslie Lamport)1990年提出的一种基于消息传递的一致性算法。Paxos的发展分类:Basic Paxos、Multi Paxos、Fast Paxos
1.角色介绍
Client: 系统外部角色,请求发起者。像民众
Proposer: 接受Client请求,向集群提出提议(propose),并在冲突发生时,起到冲突调解的作用。像议员,替民众提出议案
Acceptor:提议投票和接受者,只有在形成法定人数(Quorum,一般即为majority-多数派)时,提议才会最终被接受。像国会
Learner:提议接受者,backup-备份,对集群一致性没什么影响。像记录员
2.步骤、阶段(phases)
Phase 1a:Prepare proposer提出一个提议,编号为N,此N大于这个proposer之前提出的提案编号。请求acceptors的quorum接受。
Phase 1b:Promise 如果N大于此acceptor之前接受的任何提案编号则接受,否则拒绝。
Phase 2a:Accept 如果达到了多数派,proposer会发出 accept请求,此请求包含提案编号N,以及提案内容。
Phase 2b:Accepted 如果此acceptor在此期间没有收到任何编号大于N的提案,则接受此提案内容,否则忽略。
3.基本流程
正常流程
This diagram represents the case of a first round, which is successful (i.e. no process in the network fails).
Client Proposer Acceptor Learner
| | | | | | |
X-------->| | | | | | Request
| X--------->|->|->| | | Prepare(1)
| |<---------X--X--X | | Promise(1,{Va,Vb,Vc})
| X--------->|->|->| | | Accept!(1,V)
| |<---------X--X--X------>|->| Accepted(1,V)
|<---------------------------------X--X Response
| | | | | | |
In the following diagram, one of the Acceptors in the Quorum fails, so the Quorum size becomes 2. In this case,
the Basic Paxos protocol still succeeds.
Client Proposer Acceptor Learner
| | | | | | |
X-------->| | | | | | Request
| X--------->|->|->| | | Prepare(1)
| | | | ! | | !! FAIL !!
| |<---------X--X | | Promise(1,{Va, Vb, null})
| X--------->|->| | | Accept!(1,V)
| |<---------X--X--------->|->| Accepted(1,V)
|<---------------------------------X--X Response
| | | | | |
In the following case, one of the (redundant) Learners fails, but the Basic Paxos protocol still succeeds.
Client Proposer Acceptor Learner
| | | | | | |
X-------->| | | | | | Request
| X--------->|->|->| | | Prepare(1)
| |<---------X--X--X | | Promise(1,{Va,Vb,Vc})
| X--------->|->|->| | | Accept!(1,V)
| |<---------X--X--X------>|->| Accepted(1,V)
| | | | | | ! !! FAIL !!
|<---------------------------------X Response
| | | | | |
In this case, a Proposer fails after proposing a value, but before the agreement is reached. Specifically, it fails in the middle of the Accept
message, so only one Acceptor of the Quorum receives the value. Meanwhile, a new Leader (a Proposer) is elected (but this is not shown in detail).
Note that there are 2 rounds in this case (rounds proceed vertically, from the top to the bottom).
Client Proposer Acceptor Learner
| | | | | | |
X----->| | | | | | Request
| X------------>|->|->| | | Prepare(1)
| |<------------X--X--X | | Promise(1,{Va, Vb, Vc})
| | | | | | |
| | | | | | | !! Leader fails during broadcast !!
| X------------>| | | | | Accept!(1,V)
| ! | | | | |
| | | | | | | !! NEW LEADER !!
| X--------->|->|->| | | Prepare(2)
| |<---------X--X--X | | Promise(2,{V, null, null})
| X--------->|->|->| | | Accept!(2,V)
| |<---------X--X--X------>|->| Accepted(2,V)
|<---------------------------------X--X Response
| | | | | | |
4.潜在问题
活锁发生的流程:
The most complex case is when multiple Proposers believe themselves to be Leaders. For instance, the current leader may fail and later recover,
but the other Proposers have already re-selected a new leader. The recovered leader has not learned this yet and attempts to begin one round in
conflict with the current leader. In the diagram below, 4 unsuccessful rounds are shown, but there could be more (as suggested at the bottom of
the diagram).
Client Leader Acceptor Learner
| | | | | | |
X----->| | | | | | Request
| X------------>|->|->| | | Prepare(1)
| |<------------X--X--X | | Promise(1,{null,null,null})
| ! | | | | | !! LEADER FAILS
| | | | | | | !! NEW LEADER (knows last number was 1)
| X--------->|->|->| | | Prepare(2)
| |<---------X--X--X | | Promise(2,{null,null,null})
| | | | | | | | !! OLD LEADER recovers
| | | | | | | | !! OLD LEADER tries 2, denied
| X------------>|->|->| | | Prepare(2)
| |<------------X--X--X | | Nack(2)
| | | | | | | | !! OLD LEADER tries 3
| X------------>|->|->| | | Prepare(3)
| |<------------X--X--X | | Promise(3,{null,null,null})
| | | | | | | | !! NEW LEADER proposes, denied
| | X--------->|->|->| | | Accept!(2,Va)
| | |<---------X--X--X | | Nack(3)
| | | | | | | | !! NEW LEADER tries 4
| | X--------->|->|->| | | Prepare(4)
| | |<---------X--X--X | | Promise(4,{null,null,null})
| | | | | | | | !! OLD LEADER proposes, denied
| X------------>|->|->| | | Accept!(3,Vb)
| |<------------X--X--X | | Nack(4)
| | | | | | | | ... and so on ...
解决办法:如果发生冲突,则Proposer等待一个Random的Timeout(一般几秒)再提交自己的提议。
Basic Paxos的难度是较为出名的,且不易理解; 提交提议、提交提案(日志)内容进行了两轮RTT操作,效率较低。
1.角色介绍
减少角色,简化步骤: 由于Basic Paxos存在活锁问题,而且根因是多个Proposer导致的。Multi Paxos则提出了一个新的概念——Leader,由于Basic Paxos存在两轮RTT导致的效率低下问题,Multi Paxos则通过Leader角色 + 在消息中增加一个随机的I(the round number I is included along with each value which is incremented in each round by the same Leader),使得两轮RTT只在竞选Leader时出现,其余情况只进行一轮RTT
Leader:唯一的Proposer,所有请求都需经过此Leader
2.基本流程
1、从Basic Paxos Protocol的角色关系出发:
In the following diagram, only one instance (or "execution") of the basic Paxos protocol, with an initial Leader (a Proposer),
is shown. Note that a Multi-Paxos consists of several instances of the basic Paxos protocol.
Client Proposer Acceptor Learner
| | | | | | | --- First Request ---
X-------->| | | | | | Request
| X--------->|->|->| | | Prepare(N)
| |<---------X--X--X | | Promise(N,I,{Va,Vb,Vc})
| X--------->|->|->| | | Accept!(N,I,V)
| |<---------X--X--X------>|->| Accepted(N,I,V)
|<---------------------------------X--X Response
| | | | | | |
where V = last of (Va, Vb, Vc).
2、从Multi Paxos Protocol角色关系出发:
A common deployment of the Multi-Paxos consists in collapsing the role of the Proposers, Acceptors and Learners to "Servers".
So, in the end, there are only "Clients" and "Servers".
Client Servers
| | | | --- First Request ---
X-------->| | | Request
| X->|->| Prepare(N)
| |<-X--X Promise(N, I, {Va, Vb})
| X->|->| Accept!(N, I, Vn)
| X<>X<>X Accepted(N, I)
|<--------X | | Response
| | | |
1、从Basic Paxos Protocol的角色关系出发:
In this case, subsequence instances of the basic Paxos protocol (represented by I+1) use the same leader, so the phase 1 (of these subsequent
instances of the basic Paxos protocol), which consist in the Prepare and Promise sub-phases, is skipped. Note that the Leader should be stable,
i.e. it should not crash or change.
The following diagram represents the first "instance" of a basic Paxos protocol, when the roles of the Proposer, Acceptor and Learner are collapsed to a single role, called the "Server".
Client Proposer Acceptor Learner
| | | | | | | --- Following Requests ---
X-------->| | | | | | Request
| X--------->|->|->| | | Accept!(N,I+1,W)
| |<---------X--X--X------>|->| Accepted(N,I+1,W)
|<---------------------------------X--X Response
| | | | | | |
2、从Multi Paxos Protocol角色关系出发:
In the subsequent instances of the basic Paxos protocol, with the same leader as in the previous instances of the basic Paxos protocol,
the phase 1 can be skipped.
Client Servers
X-------->| | | Request
| X->|->| Accept!(N,I+1,W)
| X<>X<>X Accepted(N,I+1)
|<--------X | | Response
| | | |
ZAB的全称是Zookeeper atomic broadcast protocol,是Zookeeper内部用到的一致性协议。基本与Raft相同。在一些名词的叫法上有些区别:如ZAB将某一个leader的周期称为epoch,而Raft则称为Term。实现上也有些许不同:Raft保证日志连续性,心跳方向为Leader至Follower。ZAB则相反。
对于非临时数据,Nacos采用的是Raft协议,而临时数据Nacos采用的是Distro协议。
演示:http://thesecretlivesofdata.com/raft/
Spring Cloud Alibaba Nacos 在 1.0.0 正式支持 AP 和 CP 两种一致性协议,其中 CP一致性协议实现,是基于简化的 Raft 的 CP 一致性。
在1.4版本正式使用distro
Nacos支持集群模式,很显然。 而一旦涉及到集群,就涉及到主从,那么nacos是一种什么样的机制来实现的集群呢?
nacos的集群类似于zookeeper, 它分为leader角色和follower角色, 那么从这个角色的名字可以看出来,这个集群存在选举的机制。 因为如果自己不具备选举功能,角色的命名可能就是master/slave了, 当然这只是我基于这么多组件的命名的一个猜测。
Raft协议是一种强一致性、去中心化、高可用的分布式协议,它是用来解决分布式一致性问题的,相对于大名鼎鼎的Paxos协议,Raft协议更容易理解,并且在性能、可靠性、可用性方面是不输于Paxos协议的。许多中间件都是利用Raft协议来保证分布式一致性的,例如Redis的sentinel,CP模式的Nacos的leader选举都是通过Raft协议来实现的。因为Nacos的一致性协议是采用的Raft协议。
Nacos集群采用raft算法来实现,它是相对zookeeper的选举算法较为简单的一种。 选举算法的核心在 RaftCore 中,包括数据的处理和数据同步
Leader:负责接收客户端的请求
Candidate:用于选举Leader的一种角色
Follower:负责响应来自Leader或者Candidate的请求
服务启动的时候
leader挂了的时候
所有节点启动的时候,都是follower状态。 如果在一段时间内如果没有收到leader的心跳(可能是没有 leader,也可能是leader挂了),那么follower会变成Candidate。然后发起选举,选举之前,会增加 term,这个term和zookeeper中的epoch的道理是一样的。
follower会投自己一票,并且给其他节点发送票据vote,等到其他节点回复
收到过半的票数通过,则成为leader
被告知其他节点已经成为leader,则自己切换为follower
一段时间内没有收到过半的投票,则重新发起选举
第一种情况,赢得选举之后,leader会给所有节点发送消息,避免其他节点触发新的选举
第二种情况,比如有三个节点A B C。A B同时发起选举,而A的选举消息先到达C,C给A投了一 票,当B的消息到达C时,已经不能满足上面提到的第一个约束,即C不会给B投票,而A和B显然都不会给对方投票。A胜出之后,会给B,C发心跳消息,节点B发现节点A的term不低于自己的term, 知道有已经有Leader了,于是转换成follower。
第三种情况, 没有任何节点获得majority(超过半数的)投票,可能是平票的情况。加入总共有四个节点 (A/B/C/D),Node C、Node D同时成为了candidate,但Node A投了NodeD一票,NodeB投 了Node C一票,这就出现了平票 split vote的情况。这个时候大家都在等啊等,直到超时后重新发起选举。如果出现平票的情况,那么就延长了系统不可用的时间,因此raft引入了randomized election timeouts来尽量避免平票情况.
随着nacos服务启动就初始化
这里有几个核心概念或组件:
1.peer:代表每台nocas机器,记录着一台server的投票相关的元数据信息,比如本机的ip,投票给谁(votefor),AtomicLong类型的term,记录本地服务第几次发起的投票,状体(leader/follower),leader选举间隔时间等。
2.peers:是个RaftPeerSet类型,实际上记录了整个集群所有peer的信息。
3.notifier:一个线程,用作事件通知。
@DependsOn("ProtocolManager")
@Component
public class RaftCore {
//构建一个单线程池
private final ScheduledExecutorService executor = ExecutorFactory.Managed
.newSingleScheduledExecutorService(ClassUtils.getCanonicalName(NamingApp.class),
new NameThreadFactory("com.alibaba.nacos.naming.raft.notifier"));
@PostConstruct
public void init() throws Exception {
Loggers.RAFT.info("initializing Raft sub-system");
//开启一个notifier监听,这个线程中会遍历listeners,根据ApplyAction执行相应的逻辑
executor.submit(notifier);
final long start = System.currentTimeMillis();
//启动的时候先加载本地日志
//遍历/nacos/data/naming/data/文件件,也就是从磁盘中加载Datum到内存,用来做数据恢复。(数据同步采用2pc协议,leader收到请求会写写入到磁盘日志,然后再进行数据同步)
//Datum:kv对
//datums:ConcurrentMap内存数据存储
raftStore.loadDatums(notifier, datums);
//设置term值,从/nacos/data/naming/meta.properties本地磁盘中读取term的值,如果为null,默认为0
setTerm(NumberUtils.toLong(raftStore.loadMeta().getProperty("term"), 0L));
Loggers.RAFT.info("cache loaded, datum count: {}, current term: {}", datums.size(), peers.getTerm());
while (true) {
if (notifier.tasks.size() <= 0) {
break;
}
Thread.sleep(1000L);
}
initialized = true;
Loggers.RAFT.info("finish to load data from disk, cost: {} ms.", (System.currentTimeMillis() - start));
//开启定时任务,每500ms执行一次,用来判断是否需要发起leader选举
GlobalExecutor.registerMasterElection(new MasterElection());
//每500ms发起一次心跳
GlobalExecutor.registerHeartbeat(new HeartBeat());
Loggers.RAFT.info("timer started: leader timeout ms: {}, heart-beat timeout ms: {}",
GlobalExecutor.LEADER_TIMEOUT_MS, GlobalExecutor.HEARTBEAT_INTERVAL_MS);
}
}
public class MasterElection implements Runnable {
@Override
public void run() {
try {
//如果还没有初始化完成
if (!peers.isReady()) {
return;
}
//获取当前机器上跑的这个peer节点信息
RaftPeer local = peers.local();
//leader选举触发间隔时间,第一次进来,会生成(0~15000毫秒)之间的一个随机数-500.
//后面由于500ms调度一次,所以每次该线程被调起,会将该leaderDueMs减去TICK_PERIOD_MS(500ms),直到小于0的时候会触发选举
//后面每次收到一次leader的心跳就会重置leaderDueMs = 15s+(随机0-5s)
local.leaderDueMs -= GlobalExecutor.TICK_PERIOD_MS;
//当间隔时间>0,直接返回,等到下一次500ms后再调用
if (local.leaderDueMs > 0) {
return;
}
// reset timeout
//重置选举间隔时间
local.resetLeaderDue();
//重置心跳间隔时间
local.resetHeartbeatDue();
//将本地选举投票通过http发送其他几台服务器
sendVote();
} catch (Exception e) {
Loggers.RAFT.warn("[RAFT] error while master election {}", e);
}
}
private void sendVote() {
//获取本机的节点信息
RaftPeer local = peers.get(NetUtils.localServer());
Loggers.RAFT.info("leader timeout, start voting,leader: {}, term: {}", JacksonUtils.toJson(getLeader()),
local.term);
//重置peers,各个peer的voteFor与leader设为null
peers.reset();
//每一次投票,都累加一次term,表示当前投票的轮数,选举计数器,记录本地发起的是第几轮选举
local.term.incrementAndGet();
//投票选自己,此时peers中有一个votefor就是自己
local.voteFor = local.ip;
//本地server状态设置为CANDIDATE竞选状态
local.state = RaftPeer.State.CANDIDATE;
Map params = new HashMap<>(1);
params.put("vote", JacksonUtils.toJson(local));//设置本机请求参数
//遍历除了本机ip之外的其他节点,把自己的票据发送给所有节点,将选自己的投票发送给其他servers,获取其他机器的选票信息
for (final String server : peers.allServersWithoutMySelf()) {
//API_VOTE: /raft/vote
final String url = buildUrl(server, API_VOTE);
try {
//发起投票
HttpClient.asyncHttpPost(url, null, params, new AsyncCompletionHandler() {
@Override
public Integer onCompleted(Response response) throws Exception {
if (response.getStatusCode() != HttpURLConnection.HTTP_OK) {
Loggers.RAFT
.error("NACOS-RAFT vote failed: {}, url: {}", response.getResponseBody(), url);
return 1;
}
//获取其他server的响应
RaftPeer peer = JacksonUtils.toObj(response.getResponseBody(), RaftPeer.class);
Loggers.RAFT.info("received approve from peer: {}", JacksonUtils.toJson(peer));
//计算leader
peers.decideLeader(peer);
return 0;
}
});
} catch (Exception e) {
Loggers.RAFT.warn("error while sending vote to server: {}", server);
}
}
}
}
收到投票的票据
@PostMapping("/vote")
public JsonNode vote(HttpServletRequest request, HttpServletResponse response) throws Exception {
RaftPeer peer = raftCore.receivedVote(JacksonUtils.toObj(WebUtils.required(request, "vote"), RaftPeer.class));
return JacksonUtils.transferToJsonNode(peer);
}
public synchronized RaftPeer receivedVote(RaftPeer remote) {
if (!peers.contains(remote)) {
throw new IllegalStateException("can not find peer: " + remote.ip);
}
//获取本机的节点信息
RaftPeer local = peers.get(NetUtils.localServer());
//如果请求的任期小于自己的任期并且还没有投出选票,那么将票投给自己
if (remote.term.get() <= local.term.get()) {
String msg = "received illegitimate vote" + ", voter-term:" + remote.term + ", votee-term:" + local.term;
Loggers.RAFT.info(msg);
//如果voteFor为空,表示在此之前没有收到其他节点的票据。则把remote节点的票据设置到自己的节点上
if (StringUtils.isEmpty(local.voteFor)) {
local.voteFor = local.ip;
}
return local;
}
//如果上面if不成立,说明请求的任期>本地的任期 ,remote机器率先发起的投票,那么就认同他的投票
local.resetLeaderDue(); //重置本地机器的选举间隔时间
local.state = RaftPeer.State.FOLLOWER; //设置本机机器为follower,并且为请求过来的机器投票
local.voteFor = remote.ip;//本地机器投票给remote的机器
local.term.set(remote.term.get());;//同步remote的term
Loggers.RAFT.info("vote {} as leader, term: {}", remote.ip, remote.term);
return local;
}
decideLeader,表示用来决策谁能成为leader
public RaftPeer decideLeader(RaftPeer candidate) {
peers.put(candidate.ip, candidate);
SortedBag ips = new TreeBag();
//选票最多的票数
int maxApproveCount = 0;
//选票最多的ip
String maxApprovePeer = null;
/**
* 假设3个节点:A,B,C
* local节点为A,假设A,B,C第一轮同时发起选举请求
* 第一轮投票结果:
* 第一次for循环是A自己的投票(投票给自己):maxApproveCount = 1,maxApprovePeer = A
* 第二次for循环是B服务器返回的投票,该投票投向B:
* 此时 if (ips.getCount(peer.voteFor) > maxApproveCount) 条件不成立,maxApproveCount = 1,maxApprovePeer = A
* 第三次for循环是C服务器返回的投票,该投票投向C
* 此时 if (ips.getCount(peer.voteFor) > maxApproveCount) 条件不成立,maxApproveCount = 1,maxApprovePeer = A
* 第二轮投票结果:
* 第一次for循环是A自己的投票(投票给自己):maxApproveCount = 1,maxApprovePeer = A
* 第二次for循环是B服务器返回的投票,该投票投向A:
* 此时 if (ips.getCount(peer.voteFor) > maxApproveCount) 条件成立,maxApproveCount = 2,maxApprovePeer = A
* 第三次for循环是C服务器返回的投票,该投票投向C
* 此时 if (ips.getCount(peer.voteFor) > maxApproveCount) 条件不成立,maxApproveCount = 1,maxApprovePeer = A
*
*/
for (RaftPeer peer : peers.values()) {
if (StringUtils.isEmpty(peer.voteFor)) {
continue;
}
//收集选票
ips.add(peer.voteFor);
if (ips.getCount(peer.voteFor) > maxApproveCount) {
maxApproveCount = ips.getCount(peer.voteFor);
maxApprovePeer = peer.voteFor;
}
}
//majorityCount()过半节点数:2(假设3个节点)
//第一轮:maxApproveCount = 1 if条件不成立,返回leader,此时leader为null,没有选举成功
//第二轮:maxApproveCount = 2 if条件成立,返回leader,此时leader为A,没有选举成功
if (maxApproveCount >= majorityCount()) {
RaftPeer peer = peers.get(maxApprovePeer);
peer.state = RaftPeer.State.LEADER;//成为Leader
if (!Objects.equals(leader, peer)) {
leader = peer;
// 如果当前leader和选举出来的leader不是同一个,那么将选举的leader重置并且发布一个leader选举完成的事件
ApplicationUtils.publishEvent(new LeaderElectFinishedEvent(this, leader, local()));
Loggers.RAFT.info("{} has become the LEADER", leader.ip);
}
}
//返回Leader
return leader;
}
比如我们在注册服务时,调用addInstance之后,最后会调用 consistencyService.put(key, instances); 这个方法,来实现数据一致性的同步。
InstanceController.register---->registerInstance----->addInstance------>consistencyService.put(key, instances);
public void addInstance(String namespaceId, String serviceName, boolean ephemeral, Instance... ips)
throws NacosException {
String key = KeyBuilder.buildInstanceListKey(namespaceId, serviceName, ephemeral);
Service service = getService(namespaceId, serviceName);
synchronized (service) {
List instanceList = addIpAddresses(service, ephemeral, ips);
Instances instances = new Instances();
instances.setInstanceList(instanceList);
//数据同步
consistencyService.put(key, instances);
}
}
调用 consistencyService.put 用来发布类容,也就是实现数据的一致性同步。
@Override
public void put(String key, Record value) throws NacosException {
try {
raftCore.signalPublish(key, value);
} catch (Exception e) {
Loggers.RAFT.error("Raft put failed.", e);
throw new NacosException(NacosException.SERVER_ERROR, "Raft put failed, key:" + key + ", value:" + value,
e);
}
}
public static final Lock OPERATE_LOCK = new ReentrantLock();
public void signalPublish(String key, Record value) throws Exception {
//如果接受的节点不是Leader节点
if (!isLeader()) {
ObjectNode params = JacksonUtils.createEmptyJsonNode();
params.put("key", key);
params.replace("value", JacksonUtils.transferToJsonNode(value));
Map parameters = new HashMap<>(1);
parameters.put("key", key);
//获取Leader节点
final RaftPeer leader = getLeader();
//转发到Leader节点
raftProxy.proxyPostLarge(leader.ip, API_PUB, params.toString(), parameters);
return;
}
//如果自己是leader,则向所有节点发送onPublish请求。这个所有节点包含自己
try {
//加锁
OPERATE_LOCK.lock();
final long start = System.currentTimeMillis();
final Datum datum = new Datum();
datum.key = key;
datum.value = value;
if (getDatum(key) == null) {
datum.timestamp.set(1L);
} else {
datum.timestamp.set(getDatum(key).timestamp.incrementAndGet());
}
ObjectNode json = JacksonUtils.createEmptyJsonNode();
json.replace("datum", JacksonUtils.transferToJsonNode(datum));
json.replace("source", JacksonUtils.transferToJsonNode(peers.local()));
//onPublish可以当做是一次心跳了,更新选举检查时间,然后一个重点就是term增加100了。
//当然还是就是更新内容了,先写文件,再更新内存缓存。(也就是先记录本地日志)
onPublish(datum, peers.local()); //发送数据到所有节点
final String content = json.toString();
//CountDownLatch 用于控制过半提交
final CountDownLatch latch = new CountDownLatch(peers.majorityCount());
//遍历所有节点,发送事务提交请求,把记录在本地日志中的数据进行提交
for (final String server : peers.allServersIncludeMyself()) {
if (isLeader(server)) {
latch.countDown();
continue;
}
//API_ON_PUB: /raft/datum/commit 采用的是二阶段提交
final String url = buildUrl(server, API_ON_PUB);
HttpClient.asyncHttpPostLarge(url, Arrays.asList("key=" + key), content,
new AsyncCompletionHandler() {
@Override
public Integer onCompleted(Response response) throws Exception {
if (response.getStatusCode() != HttpURLConnection.HTTP_OK) {
Loggers.RAFT
.warn("[RAFT] failed to publish data to peer, datumId={}, peer={}, http code={}",
datum.key, server, response.getStatusCode());
return 1;
}
latch.countDown();
return 0;
}
@Override
public STATE onContentWriteCompleted() {
return STATE.CONTINUE;
}
});
}
if (!latch.await(UtilsAndCommons.RAFT_PUBLISH_TIMEOUT, TimeUnit.MILLISECONDS)) {
// only majority servers return success can we consider this update success
Loggers.RAFT.error("data publish failed, caused failed to notify majority, key={}", key);
throw new IllegalStateException("data publish failed, caused failed to notify majority, key=" + key);
}
long end = System.currentTimeMillis();
Loggers.RAFT.info("signalPublish cost {} ms, key: {}", (end - start), key);
} finally {
OPERATE_LOCK.unlock();
}
}
private volatile ConcurrentMap datums = new ConcurrentHashMap<>();
public void onPublish(Datum datum, RaftPeer source) throws Exception {
RaftPeer local = peers.local();
if (datum.value == null) {
Loggers.RAFT.warn("received empty datum");
throw new IllegalStateException("received empty datum");
}
if (!peers.isLeader(source.ip)) {
Loggers.RAFT
.warn("peer {} tried to publish data but wasn't leader, leader: {}", JacksonUtils.toJson(source),
JacksonUtils.toJson(getLeader()));
throw new IllegalStateException("peer(" + source.ip + ") tried to publish " + "data but wasn't leader");
}
if (source.term.get() < local.term.get()) {
Loggers.RAFT.warn("out of date publish, pub-term: {}, cur-term: {}", JacksonUtils.toJson(source),
JacksonUtils.toJson(local));
throw new IllegalStateException(
"out of date publish, pub-term:" + source.term.get() + ", cur-term: " + local.term.get());
}
//重置选举间隔时间
local.resetLeaderDue();
// if data should be persisted, usually this is true:
if (KeyBuilder.matchPersistentKey(datum.key)) {
//存储到本地磁盘中
raftStore.write(datum);
}
//并且存储到内存中
datums.put(datum.key, datum);
//如果是leader,term增加100
if (isLeader()) {
local.term.addAndGet(PUBLISH_TERM_INCREASE_COUNT);
} else {
if (local.term.get() + PUBLISH_TERM_INCREASE_COUNT > source.term.get()) {
//set leader term:
getLeader().term.set(source.term.get());
local.term.set(getLeader().term.get());
} else {
local.term.addAndGet(PUBLISH_TERM_INCREASE_COUNT);
}
}
//更新本地磁盘文件meta.properties下的term值
raftStore.updateTerm(local.term.get());
notifier.addTask(datum.key, ApplyAction.CHANGE);
Loggers.RAFT.info("data added/updated, key={}, term={}", datum.key, local.term);
}
我们看其他节点在接受到leader请求时是如何处理的,我们查看/v1/ns/raft/datum/commit接口的代码
@PostMapping("/datum/commit")
public String onPublish(HttpServletRequest request, HttpServletResponse response) throws Exception {
response.setHeader("Content-Type", "application/json; charset=" + getAcceptEncoding(request));
response.setHeader("Cache-Control", "no-cache");
response.setHeader("Content-Encode", "gzip");
String entity = IoUtils.toString(request.getInputStream(), "UTF-8");
String value = URLDecoder.decode(entity, "UTF-8");
JsonNode jsonObject = JacksonUtils.toObj(value);
String key = "key";
RaftPeer source = JacksonUtils.toObj(jsonObject.get("source").toString(), RaftPeer.class);
JsonNode datumJson = jsonObject.get("datum");
Datum datum = null;
//根据不同数据类型进行处理
if (KeyBuilder.matchInstanceListKey(datumJson.get(key).asText())) {
datum = JacksonUtils.toObj(jsonObject.get("datum").toString(), new TypeReference>() {
});
} else if (KeyBuilder.matchSwitchKey(datumJson.get(key).asText())) {
datum = JacksonUtils.toObj(jsonObject.get("datum").toString(), new TypeReference>() {
});
} else if (KeyBuilder.matchServiceMetaKey(datumJson.get(key).asText())) {
datum = JacksonUtils.toObj(jsonObject.get("datum").toString(), new TypeReference>() {
});
}
raftConsistencyService.onPut(datum, source);
return "ok";
}
主要的核心在于 raftConsistencyService.onPut(datum, source);我们进入到该方法中
public void onPut(Datum datum, RaftPeer source) throws NacosException {
try {
//在本地写入数据
raftCore.onPublish(datum, source);
} catch (Exception e) {
Loggers.RAFT.error("Raft onPut failed.", e);
throw new NacosException(NacosException.SERVER_ERROR,
"Raft onPut failed, datum:" + datum + ", source: " + source, e);
}
}
由于篇幅问题,Distro协议的核心流程以及源码分析的文章放在了CSDN中,具体参考如下地址:https://blog.csdn.net/Eclipse_2019/article/details/125906105
留言与评论(共有 0 条评论) “” |