Fivetran, Airbyte, and Stitch are three popular Extract + Load data integration tools that enable organizations to seamlessly move data from one source to another. In the data engineering industry, we sometimes refer to these applications as “lift-and-shift” tools because they lift the data from one place and shift it to another but do not perform any transformations such as cleaning, joining, or aggregating the data. In this article, we will compare the features, strengths, and weaknesses of these three tools to help you make an informed decision about which one is right for your organization. We will also provide a simple demo of each tool moving a table of data from Google Sheets into Google BigQuery.

Fivetran

Founded in 2012, Fivetran is the oldest and most well-known of these three tools. It is a cloud-based data integration platform that focuses on simplicity and ease of use. Fivetran supports over 150 different data sources, including popular ones like Salesforce, Google Analytics, and Shopify. Fivetran automatically extracts data from your sources, transforms it into a usable format, and loads it into your destination in real-time. It also has built-in support for data warehousing solutions like Snowflake, Amazon Redshift, and Google BigQuery. Fivetran is known for its reliability, performance, and security features including end-to-end encryption. In Fivetran, you begin by creating a destination for your data. The process is rather simple, and Fivetran’s excellent documentation guides you through every step. Fivetran provides you with a Google service account which you then assign the “BigQuery User” role in Google Cloud’s IAM and Admin. You provide the data’s geographic location, the name of your cloud service provider, and your time zone, and Fivetran connects to the destination. Then you need to create your first connector. We select the Google Sheets connector and arrive at the configuration screen which is once again very well documented with a helpful setup guide. We allow Fivetran to access our Google account, provide it with a destination schema name and table name, and then give it our Google Sheet URL and a named range to select the data from our sheet. From here our connection is fully established and we are ready to begin our initial sync. We can also adjust the schema by blocking or hashing columns, and we can set up a schedule for the integration to continue syncing automatically so that our warehouse stays up to date as our data evolves. All things considered, it is quite a quick and simple process to set up an integration with Fivetran. After syncing, our table is neatly transferred to its destination, our Google BigQuery data warehouse.

Airbyte

Airbyte is a relatively new (2020) open-source data integration platform that has gained a lot of attention in the industry. It has a growing list of over 200 connectors, including both open-source and commercial data sources. Airbyte is built with extensibility in mind and allows users to write custom connectors or modify existing ones to suit their needs. Airbyte is also designed to be lightweight and easy to deploy, making it a good choice for organizations with limited resources. However, Airbyte does not have some of the advanced features that Fivetran and Stitch offer, such as automatic schema management and data validation. In Airbyte, we begin by creating our source, Google Sheets in our case. Similar to Fivetran, we have excellent documentation guiding us on the right side of the screen. We allow Airbyte to access our Google account and provide it with our Google Sheet URL. Next we set up our destination, Google BigQuery. In Airbyte, this step requires slightly more know-how than in Fivetran. Instead of providing us with a service account email address which we then grant permissions, we need to create the service account ourselves, generate a JSON service account key, and paste this key into Airbyte. Once this is done, We are almost ready to sync our data. First we are required to schedule our syncs, and because we didn’t provide a named range for our data like in Fivetran, we instead specify the specific pages within our Google Sheet that we want to create data streams from. When we continue, our data immediately begins to sync. And finally lands in BigQuery.

Stitch

Stitch is a cloud-based data integration platform founded as a spinoff of RJMetrics in 2017 and acquired by Talend in 2018. Stitch focuses on flexibility and customization. It supports over 100 different data sources and has a wide range of destination options, including data warehouses, databases, and cloud storage. Stitch offers a variety of transformation options, including SQL-based transformations, custom code, and third-party ETL tools. It also has robust monitoring and alerting features, making it easy to track the health of your data pipelines. However, Stitch can be more complex to set up and configure than Fivetran and Airbyte, and it may not be the best choice for organizations that need a quick and simple solution. In Stitch, we again begin by setting up our destination. Like Airbyte, Stitch requires you to create a new Google service account which it will use to stream data to BigQuery. In the destination setup, Stitch gets slightly more technical and customizable. It asks about the data loading behavior you want to use for the destination and gives you the options of upserting or appending data. Next comes the source setup. This time we provide a file ID instead of the entire sheet URL. The file ID is a piece of the URL, and honestly Stitch doesn’t do a great job of specifying what it wants here, but with a bit of googling it is easy to figure out. We also specify a desired sync frequency, and Stitch gives us the option to specify a cutoff date if we don’t want to sync old data from our sheet. After specifying the schema we are interested in from our Google Sheet, Stitch begins loading our integration. Although Stitch may be the most technically deep tool of these three, setting up an integration was ultimately quite simple and required basically the same low level of prior knowledge as Fivetran and Airbyte.

Finally, our table touches down in Google BigQuery. In conclusion, Fivetran, Airbyte, and Stitch are all excellent data integration tools, each with its own set of strengths and weaknesses:

  • If you need a reliable and easy-to-use solution, Fivetran is a great choice.
  • If you value flexibility and extensibility, Airbyte may be the right tool for you.
  • And if you need advanced transformation capabilities and customization options, you should consider Stitch.

Ultimately, all these tools are very effective, and the best tool for your organization will depend on your specific needs and requirements. If you are looking for help specifying these requirements, or you have project plans for integrating a lift-and-shift tool into your architecture we are ready to support you in this effort. Don’t hesitate to contact us by email or phone, we are always happy to help!