转自:https://medium.com/wish-engineering/katalog-sync-reliable-integration-of-consul-and-kubernetes-ebe8aae0852a

Why use consul with Kubernetes (k8s)?

Consul is a well-known and widely used service discovery mechanism. Here at Wish, we have standardized on using consul as our service discovery system for quite some time. Although k8s has a built-in service discovery mechanism, we want to continue our usage of consul as the primary service discovery mechanism. This way services in k8s are discoverable outside of k8s and aren’t tied to a specific cluster. Now when a service needs to ramp in and out of k8s they can do so gradually.

The previous solution: sidecar consul-agent

When we launched k8s we decided to add a consul-agent sidecar to each pod. In our k8s environment, each pod has a routable IP in our VPC and as such this functions pretty well. However, after having used this for several months we have noticed a few pain points:

  1. Configuration: Each service/namespace in k8s needs to have the same consul configuration (client configuration, encryption key, etc.). We largely dealt with this through jsonnet templating, but even so, we ended up having the encryption key in each namespace and a fair amount of configuration duplicated across services.
  2. Complexity: Using this sidecar approach means that we now have a full-blown consul node for each pod in k8s. For the initial migration to k8s things were generally moved over 1:1, EC2 instance → pod, but as we continued to refine our sizing etc. we ended up having significantly more pods than we had EC2 instances before. In addition, this means we effectively had N nodes participating in the consul memberlist running on the same instance or hardware.
  3. Failure modes: With a consul-agent sidecar on each pod we can run into thundering herd issues in consul failure modes due to the large number of nodes in the cluster.
  4. Noisy alerts: Consul’s memberlist expects members to be more-or-less long-lived. Deregistration of a node in the memberlist takes (by default) 72h which means that a node will still be part of the memberlist even after leaving intentionally. In practice, this is a nuisance as the node still shows up in consul’s service discovery until it drops off (e.g. Prometheus’ consul discovery).
  5. Consul checks vs k8s checks: Probably the most painful issue we’ve run into is configuring consul checks. K8s itself has concepts of liveliness and readiness which are used within k8s to manage the pods themselves. In addition to this k8s readiness, we also needed to configure consul so it would add/remove the service from rotation based on the pod’s readiness. Operationally this is painful to keep in-sync as consul and k8s offer different mechanisms for health checks.

Looking for alternatives: consul-k8s

At the end of last yearhashicorp announced consul-k8s as a mechanism to sync services to/from k8s and consul. We were excited to switch to a more k8s-native mechanism for syncing state to consul, and quickly started prototyping with it. Going into it we listed our requirements as:

  • Configuration through k8s annotations
  • Readiness sync
  • High availability with no single point of failure (SPOF)

The good

Consul-k8s offers mechanisms to sync both from k8s → consul and consul → k8s. We don’t have a need for consul → k8s, so we’ll focus on the k8s → consul sync. Consul-k8s sync is focused on syncing services from k8s → consul. This means that you can configure syncing etc. at the service-level in k8s through annotations. For example (borrowed from here):

kind: Service
apiVersion: v1
metadata:
name: my-service
annotations:
"consul.hashicorp.com/service-name": my-consul-service

This configuration-through-annotation both dramatically simplifies templating and is significantly easier to understand.

The bad

Unsurprisingly (since we are writing this post) we ran into some issues while testing out consul-k8s. Initially, we ran into some issues with multi-cluster support but those were resolved relatively quickly. After getting a proof of concept working with multi-cluster support we started some failure mode testing. During this testing, we found 2 major issues:

In addition to those issues, we found a requirement we didn’t know we had! With the sidecar consul-agent approach if the consul-agent was unable to join the cluster for some reason the pod would fail, and k8s would halt the deployment. Consul-k8s, however, is a single-process for the cluster which asynchronously syncs state from k8s to consul.

  1. Kubelet starts container on Node
  2. Kubelet updates k8s API
  3. Consul-k8s notices change in k8s-api
  4. Consul-k8s pushes change to consul

This means the ability of consul-k8s to sync the k8s state to consul is completely independent of the k8s pod deployments. This implies that we could easily create scenarios where the entire service would complete a rolling update (with new pod IPs, etc.) without that state being synced to consul. This means that we could get into a state where service discovery has 0 correct entries in it so clients would be unable to connect to the service

最新文章

  1. HTML5网页录音和压缩,边猜边做..(附源码)
  2. crontab 启动 、运行 和编辑 查看
  3. Linux学习之CentOS(二十八)--RAID原理基础及Linux下软件RAID配置
  4. String和StringBuffer
  5. Redis设计与实现-持久化篇
  6. 【原创】OPA857 TEST模式使用
  7. 运用预加载提升H5移动页面的用户体验
  8. Kaggle Bike Sharing Demand Prediction – How I got in top 5 percentile of participants?
  9. php精简完全小结(linux/laravel篇)
  10. Elasticsearch+Hbase实现海量数据秒回查询
  11. 3D 数据
  12. UTF-8编码与GBK编码下的字符长度
  13. Redis简介+常用命令
  14. 20180518VSTO多簿单表汇总
  15. LISTVIEW显示JPEG缩略图
  16. 【 PostgreSQL】后台周期执行函数实例(shell+crontab)
  17. Jquery插件收集【m了慢慢学】
  18. 永久激活navicat_premium12.0,支持win32和64位
  19. 关于Http协议、ASP.NET 核心知识(2)
  20. phantomjs 无法打开https网站

热门文章

  1. vue-14-过滤
  2. 【转载】JVM系列一:JVM内存组成及分配
  3. python造数
  4. ylz 开发学习笔记一(注意事项)
  5. Final阶段第1周/共1周 Scrum立会报告+燃尽图 05
  6. <Yarn> <Capacity Scheduler> <Source Code>
  7. day 74 json 和 ajax 的实例
  8. Beta阶段冲刺---Day4
  9. 解决jsp表达式不能解析的问题
  10. Day1作业及默写