Terraform Provider Development Demystified

July 14, 2022

Many Terraform practitioners may be unfamiliar with provider development. How are providers actually implemented? The following offers an outline of a brief presentation I gave to the HBO Max Strategic Global Infrastructure team.

Review of the basics

First, let’s establish a foundation, especially for those who may be less familiar with Terraform.

Terraform fundamentals

Terraform enables users to describe infrastructure resources – and their dependency relationships – in .tf files using HCL, and to automate the creation and ongoing management of that infrastructure via the Terraform command line interface.

HCL configurations often describe resources associated with cloud infrastructure services such as AWS, OpenStack, or Kubernetes, but they might also describe less cloudy resources, such as local files. For example, the following configuration creates a Digital Ocean droplet, a DNSSimple A record, and a local file documenting the resulting droplet’s IP address:

resource "digitalocean_droplet" "web" {
  name   = "tf-web"
  size   = "512mb"
  image  = "centos-5-8-x32"
  region = "sfo1"
}

resource "dnsimple_record" "hello" {
  domain = "example.com"
  name   = "test"
  value  = digitalocean_droplet.web.ipv4_address
  type   = "A"
}

resource "local_file" "ip_address" {
  content  = digitalocean_droplet.web.ipv4_address
  filename = "${path.module}/ip_address.txt"
}

When invoked against a configuration (i.e. a collection of resources specified as HCL in *.tf files like the example above) via the terraform plan and/or terraform apply CLI commands, Terraform builds a dependency graph of resource attributes and relationships and analyzes…

What has been specified in the *.tf files?
How does that compare to what’s been captured in Terraform state?
How does all that compare to what may or may not actually exist, as reported by the resources’ corresponding APIs?

Based on its analysis, Terraform decides the order in which it must invoke the necessary CRUD actions (“create,” “read,” “update,” or “destroy”) against the resources’ APIs in order to produce the desired state, as specified in HCL configuration in *.tf files. I often refer to this logic as the “Terraform lifecycle algorithm,” though I may have made up that terminology; I don’t know if the Terraform maintainers would view it as appropriate, though I find it helpful.

Terraform providers

In Terraform parlance, resources (such as an individual DNS record) are associated with providers (such as DNSSimple or AWS Route 53). When declaring a resource in HCL in a .tf file, the provider name appears as the ${provider name}_ prefix. In the following example, Grafana is the provider associated with a folder resource:

resource "grafana_folder" "foo" {
  uid   = "foo"
  title = "Terraform Folder With UID foo"
}

As mentioned above, providers are often cloud infrastructure services such as AWS, OpenStack, Fastly, etc., but might also be…

SaaS platforms such as Grafana, Heroku, GitHub, Okta, etc.
a local file system, etc.

Generally speaking, a provider (and its underlying resource(s)) could be anything that can be modeled declaritively and has some sort of corresponding CRUD API(s).

Providers are decoupled from the Terraform CLI itself as independent software components, often versioned, compiled, and published to the Terraform registry via their own CI/CD processes. Typically, a Terraform provider’s source code lives in a git repository conforming to the terraform-provider-${PROVIDER} naming convention.

Providers are generally authored in Go using Terraform’s plugin SDK. So, how does this work?

Implementing a provider

A provider is configured in an individual Terraform configuration via a provider "some_provider" {}-style HCL configuration. For example, the AWS provider might be configured like…

provider "aws" {
  version = "3.40"
  region  = "us-east-1"
}

Assuming the use of the plugin SDK, a provider is implemented as a *schema.Provider on which a few key fields are specified, most notably fields like…

Schema - a map[string]*schema.Schema specifying the supported provider arguments and attributes. See schema.Schema documentation for more info.
ResourcesMap - a map[string]*schema.Resource specifying the supported resources and their related functions
DataSourcesMap - a map[string]*schema.Resource specifying the supported data sources and their related functions
ConfigureFunc - a ConfigureFunc used to configure a provider, often creating and returning an API client using the provider configuration defined in .tf via the provider "some_provider" {} HCL.

To learn more:

Resources

Individual provider resources are managed via functions that return a *schema.Resource on which a few key fields are specified, most notably fields like…

Description - a description of the resource used to generate documentation
Create - a CreateFunc function for creating the resource from configuration via the provider API
Read - a ReadFunc function for reading the resource from configuration via the provider API
Update - an UpdateFunc function for updating the resource from configuration via the provider API
Delete - a DeleteFunc function for deleting the resource if/when it’s removed from configuration via the provider API
Schema - a map[string]*schema.Schema specifying the supported resource arguments and attributes. See schema.Schema documentation for more info.
etc.

Generally, each of the individual CRUD functions accept 2 arguments:

*schema.ResourceData - this represents the Terraform configuration.
an interface{} - this is a generic interface often homing an API client package configured by the provider and used to interact with the provider APIs

Each of the CRUD functions interact with the appropriate corresponding provider APIs to create, read, update, or delete the corresponding resource, accordingly. Each of the functions is also responsible for updating Terraform state to reflect all this. However, it’s business logic codified within Terraform itself – and not the provider codebase – that decides which of the CRUD functions to invoke when, via the aforementioned “Terraform lifecycle algorithm.”

To learn more:

Data sources

Terraform data sources enable Terraform to read outside information. Unlike resources – which enable Terraform to create, update, and delete resources – data sources offer read-only functionality.

Assuming the use of the plugin SDK, individual provider data sources are managed via functions that return a schema.Resource, similar to that returned by resource functions. However, a data source’s *schema.Resource typically only specifies:

Description - a description of the data source used to generate documentation
Read - a function for reading the resource specified in configuration via the associated provider API
Schema - a map[string]*schema.Schema specifying the supported resource arguments and attributes. See schema.Schema documentation for more info.

To learn more:

an example PR extending the grafana_organization data source to include new attributes

Tying it together

In summary, provider implementation is largely composed of boilerplate-ish configuration code implementing the above-described types. Most of the provider-specific business logic is confined to the individual CRUD functions associated with individual resources, and is itself mostly focused on provider-API-interaction and the surrounding reading and writing of Terraform state. Then, finally, a main.go provides the entry point for the provider program.

To learn more:

terraform-provider-grafana example

GNUmakefile

Most provider codebases feature a GNUmakefile in which various build and test commands are specified.

Testing

While individual functions can be unit tested in isolation, the plugin SDK provides an acctest and testing utilities (most notably resource.TestCase) used to author acceptance tests against provider, provider resource, and provider data source functionality.

Generally, these acceptance tests are configured to interact with real APIs associated with a given cloud provider, though some Terraform providers – particularly those that target open source platforms or SaaS APIs – may configure acceptance tests to interact with localhost-served APIs enabled via tools such as Docker. For example, terraform-provider-grafana’s own acceptance testing utilizes a local Grafana established via docker-compose in local development, as well as remote instances of Grafana in CI/CD.

To run terraform-provider-grafana’s acceptance tests locally, install Go and Docker, then clone the repository:

git clone git@github.com:grafana/terraform-provider-grafana.git

…and run the acceptance tests against a local Docker-established Grafana:

make testacc-docker

To learn more:

Building

Typically, goreleaser is used to compile provider binaries across platforms and publish them as versioned GitHub releases. terraform-provider-grafana’s .goreleaser.yml offers an example of a goreleaser configuration used to build and publish its own GitHub releases.

Often, tfplugindocs is integrated with providers’ build process to automate the generation of markdown documents documenting the provider; these are typically committed to a docs directory in source control.

Releasing

Assuming the provider and its associated GitHub releases conform to some common standards, the provider can be published to the Terraform registry. From there, it’s available for use and can be downloaded by Terraform configurations in which it’s referenced and configured.

Learn more

In my experience, the codebase of an existing provider offers the best way to learn more, particularly that of a simple provider (as opposed to the terraform-provider-aws, whose codebase is large and daunting). Maybe it’s worth taking a look at something like terraform-provider-dominos? For me, git clone-ing provider code locally, running tests, and looking for simple areas of improvement has taught me a lot. Maybe that’s helpful for you too?

It’s also worth looking at HashiCorp’s own learning resources.

Do you see an inaccuracy or typo in this post? Submit a pull request.