At Fishtown Analytics, we’re frequently engaged by clients to build data ingestion pipelines: code that extracts data from a system and pushes it into a data warehouse.
Most of our clients use a dozen or more different systems to run their businesses—Salesforce, Shopify, Facebook Ads, etc. While off-the-shelf integrations cover the most common products, there is a large universe of systems that don’t have off-the-shelf integrations available yet.
When we get requests to build data ingestion pipelines, we almost always build them using Stitch’s open integration standard, Singer. Singer is a powerful way to write data integration jobs, called taps. Singer provides core functionality needed by applications whose goal it is to replicate data from a source to a destination on an incremental, scheduled basis. Common functionality provided by the protocol includes:
- Persistent bookmarks for incremental replication
- Authentication for common authentication schemes
- Support for common data formats
What’s particularly critical, however, is that every Singer tap can be run within the Stitch platform. This is important because 80%+ of the cost associated with a data integration is in the maintenance phase. With your tap deployed on Stitch, you won’t have to worry about:
- hosting a server where jobs are run
- scheduling jobs
- viewing log output of jobs
- building notification systems to let you know if there are run failures
If the tap becomes an officially supported Stitch tap, you won’t even have to provide support for it. We’ve built several Stitch taps at this point that have become officially supported, and Stitch now supports this code just as they would support any of the integrations they built themselves. This is important as future vendor API changes inevitably require integration code changes.
All of these benefits add up to a massive long-term cost reduction. Ultimately, our clients don’t care about owning the drill, they just want the holes. This model fits their needs perfectly.
In order to get all of these benefits, every Stitch tap we build for our clients must be open sourced. Without open sourcing the tap code, the tap can’t be incorporated into the Stitch platform and delivered like a managed service. Sometimes it feels strange for clients to pay for code that gets open-sourced—“the next person will get it for free!”—but ultimately, every single one of them has decided to be a part of the community rather than go it alone. The benefits are just too great.
We’re very bullish on the long-term dynamics of the Singer integration ecosystem and believe that it’s the very best way to solve the N-to-N data connectivity problem created by the profusion of SaaS products that businesses use today. We’re excited to be able to help kickstart it.