August 2023 Update: Unifying Your Data Sources

Community Tech Alliance
6 min readAug 30, 2023

--

It’s the dog days of summer but we’ve got that back-to-school feeling, the kind that makes you want to buy stacks of new notebooks and bouquets of sharpened pencils.

With summer winding down, our eyes are on the cycle ahead and how we can help partners like you simplify your tough data challenges.

For some ideas on how we can help, check out the case study below from our work with Voter Formation Project, bringing together data sources from across the ecosystem. And, check out how CTA…uses CTA!

Bringing Together Data Sources from Across the Internet

Case Study: Voter Formation Project

Do you have important data in Google spreadsheets, different ad platforms, and across social channels?

That was the challenge that Voter Formation Project found themselves with last cycle, as they were working tirelessly to register and mobilize new and infrequent voters of color.

With data dispersed between different platforms, it was challenging to find programmatic insights and make real-time updates, which was critical to their work.

That’s where CTA jumped in to help. In partnership with VFP, the CTA team worked to architect an integrated data infrastructure that brought these disparate sources together, allowing the VFP team time to focus on analyzing and optimizing their program, rather than figuring out how it all came together.

Read more about how VFP streamlined their data (and what lessons might apply to your program).

How CTA Uses CTA (or, “how data can optimize itself!”)

Our mission is about delivering timely, secure, and reliable data to our partner organizations — so it only makes sense that we’d use our own tools, as well.

In a nutshell, CTA uses our data to monitor our data. Data is the currency of optimizing all things, including itself! Here’s how we do it:

  • Monitoring of Data Pipelines: First, we use our own data in PAD to monitor the status of the many data pipelines we orchestrate in Google Cloud Composer. What data, you might be asking? As part of our ELT process (“Extract, Load, and Transform”), we use dbt to transform data, run unit tests to ensure data quality, and deliver it into PAD accounts. While running dbt, we use Elementary, a tool that works with dbt to collect data about each run and send that data into our own BigQuery project. The data Elementary provides is a treasure trove for observing our own data pipelines. For example, Elementary tells us how many errors or failures there were when running dbt models and tests, even reporting those failures automatically to Slack so we’re alerted right away.
  • Visualizing the Data: We then connect that data into a Looker Studio, generating a report, called our “Pipeline Health Dashboard” (or its cheeky acronym, “PHD”). The report gives us a clean and crisp view of all the syncs we run each day, flagging any errors that happened, and even displays the text of an error message.

If you’re anything like us, this might be getting you excited about monitoring your own data. The good news is that BigQuery gives you everything you need to get started right away! Every dataset in BigQuery has a set of views, called INFORMATION_SCHEMA views (such as INFORMATION_SCHEMA.TABLES), that you can query to access basic metadata about your data. For example, you can measure data freshness using a query like this:

SELECT
table_name AS table_id,
TIMESTAMP_MILLIS(last_modified_time) AS last_updated_timestamp,
TIMESTAMP_DIFF(CURRENT_TIMESTAMP(), TIMESTAMP_MILLIS(last_modified_time), MINUTE) AS minutes_since_last_updated
FROM
`your_project_id.your_dataset_name.INFORMATION_SCHEMA.TABLES`
ORDER BY
last_updated_timestamp DESC;

There are lots of cool things you can look at using the INFORMATION_SCHEMA views; you can read more about them here.

All of CTA’s internal data monitoring is powered by data we deliver into, and retrieve from, BigQuery. We love the reliability, convenience, and flexibility of BigQuery, which is why it’s our data platform of choice both for ourselves and PAD users.

CTA Team Spotlight: James Copeland, Product Director

Get to know the faces behind the data.

Hi James! What brought you to CTA?
Candidly it was 2 things: In my first interview with Kat, she referred to her vision of CTA’s offering of Data Infrastructure as a public utility which was exactly something that I was looking for coming out of the corporate space. Then, in my first interview with Fisher (CTA’s CTO), I asked what he was looking for in a Product Director and his response was: “A best friend.” They both hooked me.

In 5 (ish) words, what do you actually do at CTA?
Facilitate the development and release of products and features.

What’s your proudest work accomplishment?
Our roadmapping and planning process keeps us flexible and allows CTA to respond to partner needs, while also developing Product features. It’s an ever-evolving process that we’re regularly tweaking and retroing. I’m really excited that we decided so early on to make this a priority.

Meet James, our Product Director (and P

What’s something on your desk that inspires you?
I have PROG sitting next to me and that keeps me pretty inspired.

What’s your favorite ice cream flavor?
Mint chocolate chip!

Any hidden talents?
If a song was in the top 20 in the last 40 years, I can probably sing it word for word. Is that a talent?

Notes from BenDesk*

*Ben is our resident ZenDesk captain and manager of all help@ inquiries. We’re bringing you interesting inquiries from his inbox each month to help share learnings across our community.

Question of the Month: Is it possible to incorporate programming elements like variable settings, IF statements, or loops into my SQL queries in BigQuery?

Bendesk Answer:
Absolutely! Procedural SQL enables you to integrate programming elements into your queries through the BigQuery UI. Procedural SQL can be used for various tasks, such as:

  • Running multiple SQL statements together as one query
  • Setting a default project ID and/or dataset name when creating a series of tables so you don’t have to type them out each time
  • Use IF/THEN logic in your queries to answer questions such as “Has this table been updated with today’s data yet?” or to set up branching options for a query based on certain conditions
  • Looping through lists of variables for the same query. For example, if you need to run the same query with a different variable in the WHERE clause several times, you can loop through variables to speed up the process and avoid writing out the query multiple times.
  • And many other applications!

If you want to start using procedural SQL, you can use it just like a SQL query through your BigQuery UI. For more information and code examples, take a look at our procedural SQL documentation here. For more information, you can check out Google’s documentation here.

What We’re Reading

  • Portrait of a Thief by Grace D. Li: Ocean’s Eleven meets The Farewell in Portrait of a Thief, a lush, lyrical heist novel inspired by the true story of Chinese art vanishing from Western museums; about diaspora, the colonization of art, and the complexity of the Chinese American identity.
  • None of This Is True by Lisa Jewell: A famous podcaster ends up as the subject of her own true crime podcast and that’s all we have to say about that.
  • Georgie, All Along by Kate Clayborn: The perfect coming-of-age story for anyone who has ever had a quarter-life crisis and found themselves suddenly living in their childhood bedroom.

--

--

Community Tech Alliance

Empowering the progressive community through smart data management.