How we build and operate the Keboola data platform
Ondřej Hlaváček 5 min read

Terraform and Keboola Connection — A PayByPhone Integration story

As a developer, I use automated tests and deployment tools every day. The fact that our clients require a deeper integration of KBC…

Terraform and Keboola Connection — A PayByPhone Integration story
Phillip Wong and Cameron Stringer, Data Services team at PayByPhone

As a developer, I use automated tests and deployment tools every day. The fact that our clients require a deeper integration of KBC platform into their development processes, standards and pipelines does not surprise me. I was so pleased when I heard for the first time that PayByPhone developed their own Terraform provider (https://github.com/plmwong/terraform-provider-keboola). It felt like a dream. We’ve been trying to create a flexible platform — a toolbox for data needs, and the Terraform provider confirms its flexibility and expandability. Using Terraform is a neat way of incorporating Keboola Connection into clients’ development and deployment pipelines as part of their standard development process.

PayByPhone projects structure

PayByPhone

PayByPhone serves 1,200+ clients and helps them with reservations and payments for parking spots and with similar activities (everything that has a start and an end is considered a parking session). I have a personal experience with the older version of the app and the ecosystem. A couple of years ago, I parked in downtown Vancouver and was pleasantly surprised by how smoothly everything worked. It even notified me of my expiring reservation. The core of their app is a monolith application built around an Oracle server. A read replica is used to offload work for some reporting (which is now being replaced by a dashboard in GoodData, provided through Keboola Connection).

Infrastructure as a Code via Terraform

Terraform is used to define the infrastructure in AWS (DynamoDB, PostgreSQL and EC2), MS Azure, Google Cloud, OpenStack, etc. For PayByPhone, Keboola Connection is just another piece in their infrastructure. Phil’s strict infrastructure-as-code approach and his desire to try something in Go led to creation of the Terraform provider. A Keboola Connection project now spins up as easily as any other piece of the infrastructure. It can be easily reproduced and safely deployed. In order to develop the provider, Phil was looking mainly at what the UI does and started mimicking that behavior in the Terraform provider. Since the UI is only a thin layer on the top of API calls, everything available in there can be programmatically defined and replicated (https://developers.keboola.com/overview/api/). Yup, that’s the way it’s intended to be since we don’t have any advantage and use the same API that is open to the public.

Use Cases

From a top-level point of view, we think about the following possible use cases:

  • Seamless incorporation of KBC into the current coding practice (Infrastructure-as-code approach as a standard coding practice within a company)
  • Repository-based project management of the KBC project codebase
  • Project multiplication, templatization of the projects
  • Parameterization of the project creation (Let’s say an organization wants to manage 100 projects, each project for one client with the same codebase with a few project differences.)
  • Full fail-safe environment via immediate project restoration and the project state defined by the repository code
  • Full control over desired project state in mission critical cases (KBC as a backend and a critical part of a client’s product)

Defining a Keboola Connection Project

The KBC project definition is completely versioned in PayByPhone’s CVS. Data structure (buckets, tables, columns, just no data), transformations, orchestrations, and GoodData writer configurations — everyting is defined in code. Spinning up or updating a project is executed using Keboola Connection API calls; the project definition does not include project users nor initial Storage API tokens. This is done manually in the UI. The project id and tokens are then injected into the Terraform environment.

Note: At Keboola, we’re looking to support creation of tokens via the API too.

Project definition in Terraform

Project Lifecycle

PayByPhone has multiple KBC projects, each representing a different stage: development, Q&A, Production and others. Terraform can compare the current state of a project with its desired state and modify it accordingly. Cam prefers to manually click in the project and then wire the changes back to the code. Phil operates directly on the code. Both approaches work perfectly fine. If the change passes from development, through Q&A (they perform some manual tests to prevent any major damage to the project), the version is deployed to production. All with a single command and within seconds. How convenient!

In the PBP case, the project’s core (which is tracked in Terraform) does not change every day. Changes are infrequent, maybe a couple of releases a month. As you have already figured out it’s best if the changes are not destructive (e.g., deleting a column in a table). If a change requires some backfilling (e.g., of a new column), an ad-hoc transformation is created in the production project and modifies the required data.

Data Ingestion, Processing and Reporting

PayByPhone does not use extractors. They send their data directly to the storage via an API call (https://developers.keboola.com/integrate/storage/api/importer/) by using a windows service running extraction jobs from the Oracle database directly into Keboola Connection Storage. Triggers and scripts in Oracle detect added rows or changes in rows and create a summary that is sent to KBC. All data is stored incrementally in a Snowflake database. Upon a set of transformations, the GoodData writer loads data into their GoodData project which powers one part of their back office reporting. The Snowflake writer (with a dedicated warehouse) is used in their back office for Operational Reporting. Both dashboards and operational reporting are standard features offered to every PBP client.

AdHoc Operations and Reporting

Most of the ad-hoc operations and data manipulation happens in the production project. Terraform happily ignores all assets in the project that it’s not told to track.

Terraform compares the current and desired state of a project, ignoring all untracked resources

Hurdles and what is currently missing

I was really curious what the main hurdles of the integration were since we want to make it as smooth and easy as possible. Phil mentioned two major issues:

  • API requests inconsistency. Storage API accepts a payload using form-data, all other APIs use JSON. We are aware of this issue and will be addressing it (without breaking changes, of course).
  • GoodData writer configurations cannot reuse the same name. Phil adds:
The reason this is an issue is that it makes tearing down and re-creating a GoodData writer more difficult, because we need to either manually change the ‘name’ ourselves, or use a random_id resource (https://www.terraform.io/docs/providers/random/r/id.html) to generate one each time the resource needs to be created.

As PayByPhone does not use any extractor, its support in the current provider is currently missing. However, each component is just a resource from the Terraform point of view, so adding it should not be an issue. Also, storing sensitive data (passwords, tokens) is not supported, just because components used by PayByPhone don't require it. For example, the GoodData writer has its own API that will create a GoodData project and handle the passwords itself.

The Terraform provider for Keboola Connection is a public repository (https://github.com/plmwong/terraform-provider-keboola). We hope to see more forks and pull requests with additional functionality.

Big thanks to Phil and Cameron from PayByPhone, Fisa for filling my memory gaps and Hanka for spellchecking.

If you liked this article please share it.

Comments ()

Read next

MySQL + SSL + Doctrine

MySQL + SSL + Doctrine

Enabling and enforcing SSL connection on MySQL is easy: Just generate the certificates and configure the server to require secure…
Ondřej Popelka 8 min read