azure

Getting started with Azure DevOps YAML Pipelines (Part 1)

Whether you are in DevOps, IT Operations, or in Software Development, there is an inherent need for automating monotonous tasks to reduce toil, and Azure Pipelines can help you with that.

OSSPH

13 Sep 2022 • 4 min read

Photo by Mika Baumeister / Unsplash

What is Azure Pipelines

Whether you are in DevOps, IT Operations, or in Software Development, there is an inherent need for automating monotonous tasks to reduce toil, and Azure Pipelines can help you with that.

Azure Pipelines is one of the services offered by Azure DevOps, focusing in being a platform for which you can run various automation tasks using any repository supported by Azure DevOps, and even create your own CI/CD pipeline. One option to take is to use YAML-based pipelines, which will be the focus of this article.

So how does it work?

You create a pipeline on the platform where a YAML file containing your pipeline instructions, and register it to the platform. When it is triggered (manually, or for each commit pushed to the repository, or other means), a pipeline agent fetches your pipeline configuration YAML file. This agent is either Microsoft-hosted, or self-hosted if you prefer the pipelines to run inside your network, or you can leverage both if needed.

Your YAML configuration is then parsed and executes each tasks indicated there, much like an Ansible playbook.

Basic example

A basic pipeline file configuration typically contains the agent or agent pool it should run on, how it should be triggered, and is then followed by a set of steps called tasks which indicate what should the running agent do. Consider the YAML content below (derived from https://github.com/microsoft/azure-pipelines-yaml/blob/master/templates/empty.yml):

# Starter pipeline
# Start with a minimal pipeline that you can customize to build and deploy your code.
# Add steps that build, run tests, deploy, and more:
# https://aka.ms/yaml

# Uncomment this to disable auto-trigger with each commit pushed
# trigger: none

pool:
  vmImage: 'ubuntu-latest'

steps:
- bash: echo "Hello, world!"
  displayName: 'Hello world'

- bash: | 
   sudo apt-get install jq
   which jq
  displayName: 'Fetch jq'

- bash: |
    curl https://jsonplaceholder.typicode.com/posts | jq '.[] | select(.userId == 2)'
  displayName: 'Sample multi-line script'

In this sample pipeline, the file is configured to run on a Microsoft-hosted Linux agent ubuntu-latest, and runs three tasks using the built-in task called bash:

Print "Hello World!"
Install jq
Perform a REST API consumption using jq (see https://blog.ossph.org/bash-json-jq/)

Once this file is committed and the pipeline is triggered, a pipeline run is created indicating the session of the pipeline. In the pipeline run, the tasks are executed in the order they were written, by the specified agent or an agent in a specified pool according to the file, and you can monitor its progress or check the outputs when you click on any of the tasks indicated on the page. Adding the property displayName helps to distinguish each task/job/stage in any pipeline.

This runs your tasks assuming that the agent is capable of installing necessary packages, and accessing APIs on external networks. But what if this will not be the case, what if you need to fetch data from an external API, and use a sensitive tool to process it, and that tool SHOULD NOT be hosted on a machine that has internet access? See next example.

Multi-stage example

You can also design a pipeline to have the tasks split into multiple stages for various reasons, one practical example is if you need to execute a series of tasks on a Microsoft-hosted agent, and execute the next set of steps inside your private network (using self-hosted agents). You can also add tasks under jobs for each stage in the pipeline, and these will run in parallel (all jobs under current stage will run, but will still execute the tasks by the order it was written in each job). Consider the following YAML content below:

trigger: none

stages:
  - stage: PublicStage
    pool:
      vmImage: 'ubuntu-latest'
    displayName: Perform tasks in a host with internet access
    jobs:
    - job: GetJson
      displayName: Get Posts from API
      steps:
      - bash: |
          curl https://jsonplaceholder.typicode.com/posts -o payload.json
      - task: CopyFiles@2
        displayName: 'Copy payload to staging folder'
        inputs:
          SourceFolder: '$(System.DefaultWorkingDirectory)'
          Contents: 'payload.json'
          TargetFolder: '$(Build.ArtifactStagingDirectory)'
      - task: PublishPipelineArtifact@1
        inputs:
          targetPath: '$(Build.ArtifactStagingDirectory)'
          artifactName: 'json-payload'  
  - stage: PrivateStage
    displayName: Perform parsing in private agents
    jobs:
    - job: ParseInWindows
      pool: 'your-windows-agent-pool'
      displayName: Parse in Windows
      steps:
      - task: DownloadPipelineArtifact@2
        displayName: Get artifact
          inputs:
            artifact: 'json-payload'
            path: '$(Pipeline.Workspace)/'
      - script: jq '.[] | select(.userId == 2)' payload.json
    - job: ParseInLinux
      pool: 'your-linux-agent-pool'
      displayName: Parse in Linux
      steps:
      - task: DownloadPipelineArtifact@2
        displayName: Get artifact
          inputs:
            artifact: 'json-payload'
            path: '$(Pipeline.Workspace)/'
      - bash: |
          jq '.[] | select(.userId == 2)' payload.json

This example runs the same set of instructions as the previous one, with the difference being the tasks were split into two stages: The first stage fetching the JSON payload, and the second stage runs the JSON processing in parallel (2 jobs running their set of tasks) on a pool of self-hosted machines inside a firewall.

Note the three new tasks introduced here:

CopyFiles@2 - Copies resource(s) from one location to another
PublishPipelineArtifact@1 - Publishes resources as artifacts to make it accessible (and also downloadable from the run page). This makes the downloaded JSON payload available to the self-hosted agents on the second stage.
DownloadPipelineArtifact@2 - Downloads artifacts. The payload fetched and published as an artifact by the first stage is downloaded using this task.

Azure DevOps Pipelines provide a wide variety of tools for your automation needs. This includes build and testing tasks for various platforms and programming languages (Android, Golang, Java, Javascript, etc), as well as various utility tasks like bash and curl uploading. You can also incorporate community-built plugins like an Ansible task that allows you to run Ansible playbooks. With these, you can now practically automate everything you can think of, just be careful not to automate yourself out of a job (/joke).

On the next article, we will explore how to customize pipeline runs by implementing runtime parameters, variables, as well as reusable templates.

For further information regarding Azure Pipelines, see the resources below.

Resources

Official Azure DevOps documentation - https://docs.microsoft.com/en-us/azure/devops/pipelines/?view=azure-devops
Azure Pipeline Tasks - https://docs.microsoft.com/en-us/azure/devops/pipelines/process/tasks?view=azure-devops&tabs=yaml
Pipelines YAML templates by Microsoft - https://github.com/microsoft/azure-pipelines-yaml
Run pipelines with your Github repository - https://docs.microsoft.com/en-us/azure/devops/pipelines/repos/github?view=azure-devops&tabs=yaml

About the author

Geo Dela Paz is a technical writer at OSSPH and a site reliability engineer at IBSS Manila. Feel free to connect with Geo on GitHub, and LinkedIn.