Canary

In this module we will:

  • Review and understand the Canary Strategy

  • Deploy the Canary Rollout to the production namespace

  • Promote a new image and observe the Canary behavior

  • Replace a manual pause step for testing with an Analysis

Canary Strategy

Canary can be thought of as an advanced version of blue-green. Instead of an abrupt cut-over of live traffic between a current and new revision, traffic is instead gradually increased to the new version in increments, or steps, while simultaneously decreasing the current revision. Like the proverbial canary in the coal mine, the intent is to allow users to access the new version over time and if unexpected problems occur revert back to the previous version.

This process is depicted in the diagram below that shows how traffic is slowly migrated from the current revision to the new revision until the process is completed.

overview canary

To manage the transition of traffic between stable and canary services, Argo Rollouts supports a variety of traffic management solutions. In the absence of a traffic management solution, Rollouts will manage traffic weight on a best effort basis by adjusting the number of pod replicas associated with each service.

Deploy Canary Rollout

Now we will deploy the canary rollout in the USER_PLACEHOLDER-prod namespace following the same process that we did for the blue-green in the previous modules. Prior to starting, confirm you are still at the correct path.

cd ~/argo-rollouts-workshop/content/modules/ROOT/examples/

Next, let’s explore the manifests that we will be deploying in the ./canary/base folder:

ls ./canary/base

Similar to previous modules, note we have files for rollout.yaml, services.yaml and routes.yaml which are our Kubernetes resources for Rollout, Services and Routes respectively. Examining the Rollout first you will see the following:

cat ./canary/base/rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollouts-demo
spec:
  replicas: 8
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: rollouts-demo
  template:
    metadata:
      labels:
        app: rollouts-demo
    spec:
      containers:
      - name: rollouts-demo
        image: quay.io/openshiftdemos/rollouts-demo:blue
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
  strategy:
    canary:
      canaryService: canary
      stableService: stable
      trafficRouting:
        plugins:
          argoproj-labs/openshift:
            routes:
              - stable
      steps:
      - setWeight: 20
      - pause: {}
      - setWeight: 40
      - pause: {duration: 10s}
      - setWeight: 60
      - pause: {duration: 10s}
      - setWeight: 80
      - pause: {duration: 10s}

In the rollout manifest we have changed our strategy from blue-green to canary. In the canary strategy, like the blue-green strategy, we specify the services to use however here they are stable and canary services. Unlike blue-green we explicitly define the promotion process by specifying a series of discrete steps. At each step we can set the traffic weight between the services, pause or perform an inline analysis.

For this example, the first step sets the weight to 20% and then the second step pauses indefinitely since no duration is specified. This will allow us to observe the behavior of the canary and validate that it is performing as expected before performing a manual promotion.

Once the manual promotion has been performed, the remaining steps will continue to increase the traffic weight with short pauses between each step. For the pause step the duration can be specified in seconds(s), minutes(m) or hours(h) increments.

For the purposes of this workshop the pause sequences are deliberately short, it is common to have much longer pauses for more complex applications.

Finally note the trafficRouting stanza in the canary strategy. This tells Argo Rollouts to use the OpenShift traffic manager plugin to automatically manage the service weighting between stable and canary services as the steps are executed. Without this plugin Rollouts would provide best effort for traffic shaping by managing the scaling of pod replicas between stable and canary.

The OpenShift traffic manager plugin is included in the OpenShift GitOps distribution of Argo Rollouts and does not need to be installed manually.

Next let’s look at the Kubernetes services that are defined:

cat ./canary/base/services.yaml
apiVersion: v1
kind: Service
metadata:
  name: stable
spec:
  ports:
    - name: http
      protocol: TCP
      port: 8080
      targetPort: 8080
  selector:
    app: rollouts-demo
---
apiVersion: v1
kind: Service
metadata:
  name: canary
spec:
  ports:
    - name: http
      protocol: TCP
      port: 8080
      targetPort: 8080
  selector:
    app: rollouts-demo

As expected, services have been defined stable and canary as referenced in the Rollout. With the services out of the way, let’s examine the routes:

cat ./canary/base/routes.yaml
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  annotations:
    haproxy.router.openshift.io/disable_cookies: 'true'
    haproxy.router.openshift.io/balance: roundrobin
  name: stable
spec:
  port:
    targetPort: http
  tls:
    insecureEdgeTerminationPolicy: Redirect
    termination: edge
  to:
    kind: Service
    name: stable
    weight: 100
  alternateBackends:
    - kind: Service
      name: canary
      weight: 0
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: canary
spec:
  port:
    targetPort: http
  tls:
    insecureEdgeTerminationPolicy: Redirect
    termination: edge
  to:
    kind: Service
    name: canary

Here a route has been defined to match each service that were shown previously. The stable route defines two services, stable and canary, with a default weighting of 100 and 0 respectively. As the steps of the canary progress, the OpenShift traffic manager plugin will dynamically modify this weighting to match the value of the current step.

Finally note for the stable route, OpenShift Route annotations are being used to disable sticky sessions and use round-robin load balancing. This enables us to properly observe the split of traffic as Rollouts manage the pod replicas between stable and canary services without interference from OpenShift’s load balancer.

To deploy the canary rollout, use the following command to process the kustomization:

oc apply -k ./canary/base -n USER_PLACEHOLDER-prod

Once you have run the command we can confirm that the rollout has deployed successfully. Use the following command to ensure that the rollout is up and running and in a Healthy state:

oc argo rollouts get rollout rollouts-demo -n USER_PLACEHOLDER-prod

The console should return something similar to:

Name:            rollouts-demo
Namespace:       USER_PLACEHOLDER-prod
Status:          ✔ Healthy
Strategy:        Canary
  Step:          8/8
  SetWeight:     100
  ActualWeight:  100
Images:          quay.io/openshiftdemos/rollouts-demo:blue (stable)
Replicas:
  Desired:       8
  Current:       8
  Updated:       8
  Ready:         8
  Available:     8

NAME                                       KIND        STATUS     AGE  INFO
⟳ rollouts-demo                            Rollout     ✔ Healthy  24s
└──# revision:1
   └──⧉ rollouts-demo-66d84bcd76           ReplicaSet  ✔ Healthy  24s  stable
      ├──□ rollouts-demo-66d84bcd76-55fd9  Pod         ✔ Running  24s  ready:1/1
      ├──□ rollouts-demo-66d84bcd76-8jh88  Pod         ✔ Running  24s  ready:1/1
      ├──□ rollouts-demo-66d84bcd76-d29gr  Pod         ✔ Running  24s  ready:1/1
      ├──□ rollouts-demo-66d84bcd76-d8vk9  Pod         ✔ Running  24s  ready:1/1
      ├──□ rollouts-demo-66d84bcd76-gqlkq  Pod         ✔ Running  24s  ready:1/1
      ├──□ rollouts-demo-66d84bcd76-hr77t  Pod         ✔ Running  24s  ready:1/1
      ├──□ rollouts-demo-66d84bcd76-rprt7  Pod         ✔ Running  24s  ready:1/1
      └──□ rollouts-demo-66d84bcd76-wkg2s  Pod         ✔ Running  24s  ready:1/1

The command shows additional information about the canary rollout including the number of steps and the weight between services.

Confirm the application is accessible by checking the stable route:

The application is running with blue squares for the current version of the application:

rollouts demo app blue

If you go to the Argo Rollouts Dashboard you can see that the dashboard displays the steps that are defined in the rollout.

argo rollouts dashboard canary

Promote Image

In this section we will promote a new image and observe the behavior of the canary rollout using the same pipeline that we used previously. As a reminder, the pipeline can be accessed in the USER_PLACEHOLDER-tools namespace.

console pipelines overview

Go ahead and start the pipeline selecting the green image and wait for the pipeline to complete:

console promote params

Once the pipeline is complete, run this command to see state of the rollout:

oc argo rollouts get rollout rollouts-demo -n USER_PLACEHOLDER-prod

You should see output similar to the following:

Name:            rollouts-demo
Namespace:       USER_PLACEHOLDER-prod
Status:          ॥ Paused
Message:         CanaryPauseStep
Strategy:        Canary
  Step:          1/8
  SetWeight:     20
  ActualWeight:  20
Images:          quay.io/openshiftdemos/rollouts-demo:blue (stable)
                 quay.io/openshiftdemos/rollouts-demo:green (canary)
Replicas:
  Desired:       8
  Current:       10
  Updated:       2
  Ready:         10
  Available:     10

NAME                                       KIND        STATUS     AGE   INFO
⟳ rollouts-demo                            Rollout     ॥ Paused   2m7s
├──# revision:2
│  └──⧉ rollouts-demo-5999df6cf9           ReplicaSet  ✔ Healthy  58s   canary
│     ├──□ rollouts-demo-5999df6cf9-mqpc4  Pod         ✔ Running  58s   ready:1/1
│     └──□ rollouts-demo-5999df6cf9-nxwt4  Pod         ✔ Running  58s   ready:1/1
└──# revision:1
   └──⧉ rollouts-demo-66d84bcd76           ReplicaSet  ✔ Healthy  2m7s  stable
      ├──□ rollouts-demo-66d84bcd76-5dpb5  Pod         ✔ Running  2m7s  ready:1/1
      ├──□ rollouts-demo-66d84bcd76-9rbtg  Pod         ✔ Running  2m7s  ready:1/1
      ├──□ rollouts-demo-66d84bcd76-cj6ql  Pod         ✔ Running  2m7s  ready:1/1
      ├──□ rollouts-demo-66d84bcd76-dkdpd  Pod         ✔ Running  2m7s  ready:1/1
      ├──□ rollouts-demo-66d84bcd76-fkbpb  Pod         ✔ Running  2m7s  ready:1/1
      ├──□ rollouts-demo-66d84bcd76-j8pfg  Pod         ✔ Running  2m7s  ready:1/1
      ├──□ rollouts-demo-66d84bcd76-wgw5h  Pod         ✔ Running  2m7s  ready:1/1
      └──□ rollouts-demo-66d84bcd76-wvqw9  Pod         ✔ Running  2m7s  ready:1/1

There are a few things of note here. First the status of the Rollout is Paused due to the pause step with no duration. Second that we have two ReplicaSets, one with 2 pods and the other with 8 pods corresponding to the preview and stable services respectively. Recall in our first step that we set a weight of 20% to the canary service.

Next visit the Argo Rollouts Dashboard and note that the the rollout is paused on the pause step:

argo rollouts dashboard canary pause

Now let’s see the behavior of the routes, first if you check stable you will see approximately 20% green squares versus 80% blue squares reflecting the 20% weighting of canary in the first step:

rollouts demo app canary blue green

If we view the stable route you will see the 80/20 weighting between stable and canary services has been set by the OpenShift traffic manager:

oc get route stable -n USER_PLACEHOLDER-prod -o yaml | oc neat
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  annotations:
    haproxy.router.openshift.io/balance: roundrobin
    haproxy.router.openshift.io/disable_cookies: "true"
    openshift.io/host.generated: "true"
  name: stable
  namespace: user1-prod
spec:
  alternateBackends:
  - kind: Service
    name: canary
    weight: 20
  host: stable-user1-prod.apps.cluster-5qnlc.5qnlc.sandbox2820.opentlc.com
  port:
    targetPort: http
  tls:
    insecureEdgeTerminationPolicy: Redirect
    termination: edge
  to:
    kind: Service
    name: stable
    weight: 80
  wildcardPolicy: None

Next if we check the canary version of the application we should see only the green version of the application.

rollouts demo app green

To promote the rollout, you can either promote it from the dashboard using the Promote button or you can promote it using the following command:

oc argo rollouts promote rollouts-demo -n USER_PLACEHOLDER-prod

Observe the dashboard once it has been promoted. The dashboard will show the progression of the steps by highlighting each step as it is being executed. Also note how pods are being added to the new revision as traffic weighting changes.

Prior to moving on to the next section, perform a cleanup to remove the current rollout and reset the Deployment in the dev environment back to blue.

Update the deployment:

oc apply -k ./deploy/base -n USER_PLACEHOLDER-dev

Delete the current rollout:

oc delete -k ./canary/base -n USER_PLACEHOLDER-prod

Inline Analysis

In the last section there was a pause step that provided an opportunity to manually test the canary before progressing further. However we can accomplish the same goal by using an analysis. With respect to the canary strategy, an analysis can be performed in the background or as an inline analysis.

A Background Analysis happens asynchronously and does not block the progression of steps, however if the analysis fails it will abort the rollout similar to what we saw in the previous module with the blue-green strategy. In the case of an Inline Analysis, the analysis is performed as a discrete step and will block the progression of the rollout until it completes.

In the following example we will implement an Inline Analysis. The files for this example are in the ./canary-analysis/base folder, to view the list of files perform an ls as follows:

ls ./canary-analysis/base

Note that the files are identical to the previous example other than the rollout.yaml and the analysistemplate.yaml file. The AnalysisTemplate being used here is identical to the one we used in the blue-green example so we will not cover it again here.

The one change in the rollout is that it now has an inline analysis step as per below:

cat ./canary-analysis/base/rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollouts-demo
spec:
  replicas: 8
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: rollouts-demo
  template:
    metadata:
      labels:
        app: rollouts-demo
    spec:
      containers:
      - name: rollouts-demo
        image: quay.io/openshiftdemos/rollouts-demo:blue
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
  strategy:
    canary:
      canaryService: canary
      stableService: stable
      trafficRouting:
        plugins:
          argoproj-labs/openshift:
            routes:
              - stable
      steps:
      - setWeight: 20
      - analysis:
          templates:
          - templateName: smoke-tests
          args:
            - name: namespace
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: route-name
              value: canary
            - name: route-url
              value: canary-%USER%-prod.%SUB_DOMAIN%
      - pause: {duration: 10s}
      - setWeight: 60
      - pause: {duration: 10s}
      - setWeight: 80
      - pause: {duration: 10s}

Notice that the structure of the inline analysis is identical to what was used in the prePromotionAnalysis in the blue-green rollout with analysis.

To deploy the canary with the inline analysis execute the following command:

kustomize build ./canary-analysis/base | sed "s/%SUB_DOMAIN%/OPENSHIFT_CLUSTER_INGRESS_DOMAIN_PLACEHOLDER/" | sed "s/%USER%/USER_PLACEHOLDER/" | oc apply -n USER_PLACEHOLDER-prod -f -

Once the command has been executed, verify that the rollout was deployed:

oc argo rollouts get rollout rollouts-demo -n USER_PLACEHOLDER-prod

You should see output as follows:

Name:            rollouts-demo
Namespace:       USER_PLACEHOLDER-prod
Status:          ✔ Healthy
Strategy:        Canary
  Step:          7/7
  SetWeight:     100
  ActualWeight:  100
Images:          quay.io/openshiftdemos/rollouts-demo:blue (stable)
Replicas:
  Desired:       8
  Current:       8
  Updated:       8
  Ready:         8
  Available:     8

NAME                                       KIND        STATUS     AGE    INFO
⟳ rollouts-demo                            Rollout     ✔ Healthy  4m56s
└──# revision:1
   └──⧉ rollouts-demo-66d84bcd76           ReplicaSet  ✔ Healthy  5s     stable
      ├──□ rollouts-demo-66d84bcd76-c4d6j  Pod         ✔ Running  5s     ready:1/1
      ├──□ rollouts-demo-66d84bcd76-f9qvw  Pod         ✔ Running  5s     ready:1/1
      ├──□ rollouts-demo-66d84bcd76-gp9xp  Pod         ✔ Running  5s     ready:1/1
      ├──□ rollouts-demo-66d84bcd76-gpqwj  Pod         ✔ Running  5s     ready:1/1
      ├──□ rollouts-demo-66d84bcd76-k6dwl  Pod         ✔ Running  5s     ready:1/1
      ├──□ rollouts-demo-66d84bcd76-mlj5q  Pod         ✔ Running  5s     ready:1/1
      ├──□ rollouts-demo-66d84bcd76-wp4tj  Pod         ✔ Running  5s     ready:1/1
      └──□ rollouts-demo-66d84bcd76-z8kr2  Pod         ✔ Running  5s     ready:1/1

Next, examining the Argo Rollouts Dashboard we can see the inline Analysis being shown with the other steps:

argo rollouts dashboard canary analysis

Now that since we no longer have a manual pause step the promotion will complete automatically as long as the analysis step executes successfully.

Let’s do our promotion to the green image, go to the USER_PLACEHOLDER-tools namespace and start the pipeline again. Once the pipeline has completed, observe the behavior of the canary deployment during the process in the Argo Rollouts Dashboard.

If you want to try this multiple times to look at different things feel free to use the pipeline to deploy different colors. Remember the available colors are available .here.

In the dashboard, if you catch it before it completes, you can see the analysis step executing. Similar to what we saw in the previous module, the analysis button will have a blue button while it is executing which then goes green when it has completed successfully or red if it failed.

Here is what the dashboard looks like while the analysis is executing:

argo rollouts dashboard analysis executing

Once the promotion is completed, the dashboard will appear as follows:

argo rollouts dashboard analysis completed

In the command line, you can view the rollout using the command:

oc argo rollouts get rollout rollouts-demo -n USER_PLACEHOLDER-prod

Information about the rollout will appear as follows:

Name:            rollouts-demo
Namespace:       USER_PLACEHOLDER-prod
Status:          ✔ Healthy
Strategy:        Canary
  Step:          7/7
  SetWeight:     100
  ActualWeight:  100
Images:          quay.io/openshiftdemos/rollouts-demo:green (stable)
Replicas:
  Desired:       8
  Current:       8
  Updated:       8
  Ready:         8
  Available:     8

NAME                                                        KIND         STATUS        AGE    INFO
⟳ rollouts-demo                                             Rollout      ✔ Healthy     4m19s
├──# revision:2
│  ├──⧉ rollouts-demo-5999df6cf9                            ReplicaSet   ✔ Healthy     3m13s  stable
│  │  ├──□ rollouts-demo-5999df6cf9-g7l75                   Pod          ✔ Running     3m13s  ready:1/1
│  │  ├──□ rollouts-demo-5999df6cf9-zxkss                   Pod          ✔ Running     3m13s  ready:1/1
│  │  ├──□ rollouts-demo-5999df6cf9-mj9m8                   Pod          ✔ Running     80s    ready:1/1
│  │  ├──□ rollouts-demo-5999df6cf9-ph9jk                   Pod          ✔ Running     80s    ready:1/1
│  │  ├──□ rollouts-demo-5999df6cf9-rnvgm                   Pod          ✔ Running     80s    ready:1/1
│  │  ├──□ rollouts-demo-5999df6cf9-btzlf                   Pod          ✔ Running     67s    ready:1/1
│  │  ├──□ rollouts-demo-5999df6cf9-gl8k4                   Pod          ✔ Running     67s    ready:1/1
│  │  └──□ rollouts-demo-5999df6cf9-dlchv                   Pod          ✔ Running     54s    ready:1/1
│  └──α rollouts-demo-5999df6cf9-2-1                        AnalysisRun  ✔ Successful  3m10s  ✔ 5
│     └──⊞ fd0f7c64-c6e4-4447-bbef-5d2f4f62563b.run-load.1  Job          ✔ Successful  3m10s
└──# revision:1
   └──⧉ rollouts-demo-66d84bcd76                            ReplicaSet   • ScaledDown  4m19s

In this module the canary strategy for Argo Rollouts has been reviewed along with how to use an inline analysis step to perform testing of the canary deployment.

More Information