Editor’s Note: This is the fifth in a series of blog posts by Google Certified Trainers Ben Finkel and Garth Schulte that will explore Google Cloud Platform.
BIG DATA has been piquing the interest of business and data professionals since its inception. Terabytes and petabytes of data generated by users, infrastructures, devices, and systems can provide insight so valuable that it’s no wonder this hot topic is still well… HOT! But let’s be honest, working with big data is no small task, at any level, as it requires massive infrastructure, development, and personnel investments which limit its accessibility.
Google is the forefather of big data as we know it. It encountered the problem when indexing the internet for their web search product and solved it from the ground up by engineering the methods and techniques underneath virtually every big data technology today. Enter Google Cloud Platform! GCP dramatically lowers the barrier of entry to the big data scene and brings it to the masses by exposing the very technologies powering Google’s products through services such as BigQuery, Cloud Dataflow, and Cloud Pub/Sub. Let’s demystify each one!
BigQuery is a fully managed big data storage and analysis service without the infrastructure requirements of your typical big data solution — equating to very low startup and operating costs. It’s incredibly simple to work with BigQuery, from loading or streaming large quantities of data into familiar table-like structures, to asking questions with its SQL-like query language (no hardcore mapreduce programming here my friends!), to pulling or exporting results outside of BigQuery for analysis. It’s also blazing fast; millions of records queried in seconds is the norm for BigQuery!
Cloud Dataflow is a fully managed data processing service that supports everything from ETL (Extract, Transform, Load), to batch and even stream processing of your data pipelines. It sits outside of other GCP data storage services such as BigQuery, Cloud Storage, Cloud Bigtable, Cloud Pub/Sub, and Cloud Datastore to provide us with a unified programming model across these services. Not to mention this gives us the ability to centralize all of our data operations. DataOps is upon us! How cool is that?!?
Cloud Pub/Sub is a fully managed service for connecting applications, services, and data streams through reliable real-time asynchronous messaging. Think of it as the middleware component for passing data between applications and synchronizing distributed systems, no matter where they’re hosted, which means integrating systems from inside or outside of GCP has just become a whole lot easier. So if you’re looking for a reliable way to pass those small chunks of data generated from user devices to your big data processes, Cloud Pub/Sub is your huckleberry!
These GCP products and services provide us with everything needed to create a big data solution and more, from the inside out with respect to time and cost. As with everything Google Cloud Platform, they come packaged with built-in reliability, scalability, and peace of mind knowing your big data solutions are running on the same world class infrastructure and technology that powers Google’s own products.
Want to explore GCP further? Browse our entire library of Google Cloud training!