Are you sure you want to cancel your subscription?

If you cancel, your subscription will remain active through the paid term. You will be able to reactivate the subscription until that date.

Sorry to see you go

Your subscription will remain active until . If you change your mind, you may rectivate your subscription anytime before that date.

Are you sure you want to reactivate?
Welcome Back!

Your subscription has been reactivated and you will continue to be charged on .

Reactivate Subscription

Thank you for choosing to reactivate your subscription. In order to lock in your previous subscription rate, you owe: .

Your Subscription term is from - .

Questions? Call Sales.

Payment Due:

Auto-Renew Subscription

To auto-renew your subscription you need to select or enter your payment method in "Your Account" under Manage Payments.

Click continue to set up your payments.

CBT Nuggets License Agreement

Unless otherwise stated all references to “training videos” or to “videos” includes both individual videos within a series, entire series, series packages, and streaming subscription access to CBT Nuggets content. All references to CBT or CBT Nuggets shall mean CBT Nuggets LLC, a Delaware limited liability company located at 44 Country Club Road, Ste. 150, Eugene, Oregon.

A CBT Nuggets license is defined as a single user license. Accounts may purchase multiple users, and each user is assigned a single license.

  • GRANT OF LICENSE. CBT Nuggets grants you a non-transferable, non-exclusive license to use the training videos contained in this package or streaming subscription access to CBT content (the “Products”), solely for internal use by your business or for your own personal use. You may not copy, reproduce, reverse engineer, translate, port, modify or make derivative works of the Products without the express consent of CBT. You may not rent, disclose, publish, sell, assign, lease, sublicense, market, or transfer the Products or use them in any manner not expressly authorized by this Agreement without the express consent of CBT. You shall not derive or attempt to derive the source code, source files or structure of all or any portion of the Products by reverse engineering, disassembly, decompilation or any other means. You do not receive any, and CBT Nuggets retains all, ownership rights in the Products. The Products are copyrighted and may not be copied, distributed or reproduced in any form, in whole or in part even if modified or merged with other Products. You shall not alter or remove any copyright notice or proprietary legend contained in or on the Products.
  • TERMINATION OF LICENSE. Once any applicable subscription period has concluded, the license granted by this Agreement shall immediately terminate and you shall have no further right to access, review or use in any manner any CBT Nuggets content. CBT reserves the right to terminate your subscription if, at its sole discretion, CBT believes you are in violation of this Agreement. CBT reserves the right to terminate your subscription if, at its sole discretion, CBT believes you have exceeded reasonable usage. In these events no refund will be made of any amounts previously paid to CBT.
  • DISCLAIMER OF WARRANTY AND LIABILITY. The products are provided to you on an “as is” and “with all faults” basis. You assume the entire risk of loss in using the products. The products are complex and may contain some nonconformities, defects or errors. CBT Nuggets does not warrant that the products will meet your needs, “expectations or intended use,” that operations of the products will be error-free or uninterrupted, or that all nonconformities can or will be corrected. CBT Nuggets makes and user receives no warranty, whether express or implied, and all warranties of merchantability, title, and fitness for any particular purpose are expressly excluded. In no event shall CBT Nuggets be liable to you or any third party for any damages, claim or loss incurred (including, without limitation, compensatory, incidental, indirect, special, consequential or exemplary damages, lost profits, lost sales or business, expenditures, investments, or commitments in connection with any business, loss of any goodwill, or damages resulting from lost data or inability to use data) irrespective of whether CBT Nuggets has been informed of, knew of, or should have known of the likelihood of such damages. This limitation applies to all causes of action in the aggregate including without limitation breach of contract, breach of warranty, negligence, strict liability, misrepresentation, and other torts. In no event shall CBT Nuggets’ liability to you or any third party exceed $100.00.
  • REMEDIES. In the event of any breach of the terms of the Agreement CBT reserves the right to seek and recover damages for such breach, including but not limited to damages for copyright infringement and for unauthorized use of CBT content. CBT also reserves the right to seek and obtain injunctive relief in addition to all other remedies at law or in equity.
  • MISCELLANEOUS. This is the exclusive Agreement between CBT Nuggets and you regarding its subject matter. You may not assign any part of this Agreement without CBT Nuggets’ prior written consent. This Agreement shall be governed by the laws of the State of Oregon and venue of any legal proceeding shall be in Lane County, Oregon. In any proceeding to enforce or interpret this Agreement, the prevailing party shall be entitled to recover from the losing party reasonable attorney fees, costs and expenses incurred by the prevailing party before and at any trial, arbitration, bankruptcy or other proceeding and in any appeal or review. You shall pay any sales tax, use tax, excise, duty or any other form of tax relating to the Products or transactions. If any provision of this Agreement is declared invalid or unenforceable, the remaining provisions of this Agreement shall remain in effect. Any notice to CBT under this Agreement shall be delivered by U.S. certified mail, return receipt requested, or by overnight courier to CBT Nuggets at the following address: 44 Club Rd Suite 150, Eugene, OR 97401 or such other address as CBT may designate.

CBT Nuggets reserves the right, in its sole discretion, to change, modify, add, or remove all or part of the License Agreement at any time, with or without notice.

Billing Agreement

  • By entering into a Billing Agreement with CBT Nuggets, you authorize CBT Nuggets to use automatic billing and to charge your credit card on a recurring basis.
  • You agree to pay subscription charges on a monthly basis, under the following terms and conditions:
    • CBT Nuggets will periodically charge your credit card each monthly billing cycle as your subscription charges become due;
    • All payments are non-refundable and charges made to the credit card under this agreement will constitute in effect a "sales receipt" and confirmation that services were rendered and received;
    • To terminate the recurring billing process and/or arrange for an alternative method of payment, you must notify CBT Nuggets at least 24 hours prior to the end of the monthly billing cycle;
    • You will not dispute CBT Nugget’s recurring billing charges with your credit card issuer so long as the amount in question was for periods prior to the receipt and acknowledgement of a written request to cancel your account or cancel individual licenses on your account.
  • You guarantee and warrant that you are the legal cardholder for the credit card associated with the account, and that you are legally authorized to enter into this recurring billing agreement.
  • You agree to indemnify, defend and hold CBT Nuggets harmless, against any liability pursuant to this authorization.
  • You agree that CBT Nuggets is not obligated to verify or confirm the amount for the purpose of processing these types of payments. You acknowledge and agree that Recurring Payments may be variable and scheduled to occur at certain times.
  • If your payment requires a currency conversion by us, the amount of the currency conversion fee will be determined at the time of your payment. You acknowledge that the exchange rate determined at the time of each payment transaction will differ and you agree to the future execution of payments being based on fluctuating exchange rates.

CBT Nuggets reserves the right, in its sole discretion, to change, modify, add, or remove all or part of the Billing Agreement at any time, with or without notice.

Apache Hadoop

Course Duration: 08:47:19
Hadoop CourseIntroduction
Welcome to Hadoop! This Nugget explains the challenges BIG DATA poses, how Hadoop was designed to solve them, and the value unstructured data can bring to companies of all sectors and sizes. We'll also cover the state of data, how companies use BIG DATA, and a high-level overview of Hadoop and its core technologies.
Hadoop Technology Stack
The sheer number of technologies around Hadoop is enough to make even the bravest IT souls cringe. This Nugget is here to save the day! We'll cover the core, essential and upcoming Hadoop projects and see what a basic Hadoop implementation looks like.
Hadoop Distributed File System (HDFS)
This Nugget dives into the architecture and internal workings of HDFS! We'll cover all of the HDFS node types along with their responsibilities in a Hadoop cluster, talk about single and multi-rack cluster topologies, rack awareness, and how HDFS handles block management. We'll also cover some of the major user and administration tools used to interact with HDFS.
Introduction to MapReduce
The world of distributed data processing and programming starts here! This Nugget begins by covering Hadoop's MapReduce architecture to see how the JobTracker and TaskTracker work together to serve up data stored across a cluster. We'll also cover MapReduce's internal phases and see how data flows through a MapReduce pipeline. This Nugget ends with a live demonstration that executes a MapReduce job across a local Hadoop cluster.
Installing Apache Hadoop (Single Node)
Installation time, w00t! This Nugget will walk you through an installation of Hadoop in pseudo-distributed mode to create a single node Hadoop cluster. We'll cover how to install and configure SSH, Java and Hadoop. Start, verify and stop all of the Hadoop demons, and cover time-saving tips along the way to make it as painless as possible.
Installing Apache Hadoop (Multi Node)
This Nugget will show you how to break out from a single node cluster to a fully distributed multi-node cluster. We'll cover what cluster configurations look like, how to configure master/slave nodes, and how to properly start and stop a multi-node cluster, along with some common cluster "stuff." Good times!
Troubleshooting, Administering and Optimizing Hadoop
This Nugget will get you up to speed on troubleshooting and tuning a cluster. We'll cover the troubleshooting process, walk through a demonstration on how to reproduce and fix a common installation issue and see where to look when issues occur. We'll also cover common administration tasks and walk through a demo on how to benchmark and tune a cluster using TeraGen/TeraSort.
Managing HDFS
Data! Data! Get your data here! We'll cover where to find sample data - big and small-for your Hadoop clusters, and how to push that data into HDFS and manage it with utilities such as dfsadmin and fsck. We'll also cover the upgrade process and how to configure rack awareness.
MapReduce Development
Take a journey through the magical land of MapReduce development! This Nugget will cover the development process and go through multiple live demonstrations on how to code, test, build, and run a MapReduce job on local filesystem data and against HDFS data in a live cluster.
Introduction to Pig
This Nugget will cover Pig, the data flow scripting language of Hadoop! You'll learn how Pig is a simple abstraction on top of MapReduce to quickly and easily write queries against HDFS data. We'll get the basics and components of Pig down, get familiar with the Pig Latin language, install and configure Pig, and see it in action with a demo. Oink Oink!
Developing with Pig
This Nugget is chock-full of Pig demonstrations! We'll cover how to load, store, filter, group, aggregate, and sort HDFS data interactively using Pig Latin within the grunt shell. We'll follow that up with batch processing by analyzing and executing complete Pig scripts where we cover how to join, combine and split our data flows, as well as write and implement our own custom user-defined functions.
Introduction to Hive
This Nugget covers Hive, the SQL of Hadoop! You'll learn how Hive is a another simple abstraction on top of MapReduce that provides us with a familiar way to access HDFS data. We'll cover the components and architecture of Hive to see how it stores data in table like structures over HDFS data. Also covered are the basics of HiveQL, the SQL-compliant query language used to query those structures, and installation and configuration of Hive. We'll end this Nugget with a live demo to see Hive and HiveQL in action. BZZZT!
Developing with Hive
This Nugget is chock-full of Hive demonstrations! We'll cover how to create external, internal, and partitioned hive tables, load data from the local filesystem as well as the distributed filesystem (HDFS), setup dynamic partitioning, create views, and manage indexes.
Introduction to HBase
This Nugget covers HBase, the low-latency way of getting small specific data out of Hadoop! We'll start with what HBase is all about and highlight the differences between row and column-oriented data stores. Also, we'll get familiar with the architecture of HBase, get it up and running in our Hadoop cluster, and even see it in action by creating, loading, and dropping an HBase table. Woohoo HBase!
Developing with HBase
Learn how to work with HBase data in this Nugget! We'll talk about the many ways to load and access HBase data, see how to configure a fully distributed HBase cluster, load data into an HBase table using Pig, query an HBase table using Hive, as well as pull a record of data out in real-time by starting and using a REST server.
Introduction to Zookeeper
Learn how to coordinate distributed applications with Zookeeper in this Nugget! We'll cover what Zookeeper is, the architecture, internal data storage, and learn how to bring up an ensemble of Zookeeper servers and store data within those servers.
Introduction to Sqoop
This Nugget will show you how to transfer data between Hadoop and relational database systems using Sqoop. We'll cover how to get Sqoop installed and configured, import data from a mySQL server into HDFS, import data from a SQL Server into Hive, and export data from Hadoop into a SQL Server instance. Sqadoosh!
Local Hadoop: Cloudera CDH VM
This Nugget will show you how to get up and running with Hadoop and its projects quickly! Cloudera's quick-start virtual machine is a great way to jump in and start learning Hadoop without the hassle of fully configuring a Hadoop cluster. We'll cover what Cloudera CDH is all about and some of the unique tools it offers, how to obtain Cloudera's quickstart VM, get it up and running, and take a tour of the VM, including Cloudera Manager and Hue.
Cloud Hadoop: Amazon EMR
This Nugget will show you how to use Amazon Web Services (AWS) Elastic MapReduce (EMR) to run fully managed Hadoop jobs in the cloud! We'll cover what EMR is and see how it's built on top of EC2 for processing and S3 for storage. We'll also transfer data directly from HDFS to an S3 bucket and create an EMR job flow to process the data stored inside of S3. (side note: all of these acronyms combined = AWSEMREC2S3!)
Cloud Hadoop: Microsoft HDInsight
Are you a Microsoft shop looking for a Hadoop solution that seamlessly integrates with your existing technology stack? Look no further! This Nugget will get you up to speed on HDInsight, Microsoft's take on Hadoop both locally and in the cloud. We'll cover HDInsight from the Windows Azure cloud and see how to get a fully configured Hadoop cluster up and running in minutes. We'll also take a tour of the HDInsight web portal, run a few samples and even RDP into our cluster. Who knew elephants could fly?!

No Bookmarks

The data revolution is upon us and Hadoop is THE leading Big Data platform. Fortune 500 companies are using it for storing and analyzing extremely large datasets, while other companies are realizing its potential and preparing their budgets for future Big Data positions. It's the elephant in Big Data's room!

Recommended skills:
  • Familiarity with Ubuntu Linux

Recommended equipment:
  • Ubuntu Linux 12.04 LTS operating system

Related certifications:
  • None

Related job functions:
  • Big Data architects
  • Big Data administrators
  • Big Data developers
  • IT professionals

  • This course will get you up to speed on Big Data and Hadoop. Topics include how to install, configure and manage a single and multi-node Hadoop cluster, configure and manage HDFS, write MapReduce jobs and work with many of the projects around Hadoop such as Pig, Hive, HBase, Sqoop, and Zookeeper. Topics also include configuring Hadoop in the cloud and troubleshooting a multi-node Hadoop cluster.

Hadoop CourseIntroduction

00:00:00 - Hadoop Series Introduction.
00:00:02 - Hey, everyone.
00:00:03 - Garth Schulte from CBT Nuggets.
00:00:04 - It's an honor to be your guide through this series and the
00:00:07 - wonderful world of big data.
00:00:09 - There's a buzz term for you, big data.
00:00:11 - Kind of like the cloud, two of the most popular buzz terms
00:00:14 - around today, and for good reason.
00:00:15 - There's a big market and a big future for big data.
00:00:19 - Companies are starting to realize there's an untapped
00:00:21 - treasure trove of information sitting in unstructured
00:00:24 - documents on hard drives everywhere.
00:00:28 - And while Hadoop was built to handle terabytes and petabytes
00:00:31 - of data-- and that's what a lot of the big companies
00:00:33 - jumped on it for.
00:00:34 - That's the essence of big data--
00:00:35 - even small to medium-size companies are real what we
00:00:38 - have untapped information in unstructured documents spread
00:00:42 - across the network.
00:00:43 - We have emails.
00:00:44 - Emails, imagine if we could data mine our emails, the kind
00:00:47 - of information we can find in them.
00:00:49 - Documents, PDFs, spreadsheets, text files, all of this
00:00:52 - unstructured data sitting across a network that contains
00:00:55 - answers, answers that will help us create new products,
00:00:57 - refine existing products, discover trends, improve
00:01:00 - customer relations, understand ourselves and
00:01:03 - even our company better.
00:01:05 - Hadoop answers many of the big data
00:01:07 - challenges that we see today.
00:01:09 - How do you store terabytes and petabytes of information?
00:01:12 - How do you access that information quickly?
00:01:13 - And how do you work with data that's in of a variety of
00:01:16 - different formats, structured, semi-structured, unstructured?
00:01:20 - And how do you do all that in a scalable, fault-tolerant,
00:01:23 - and flexible way?
00:01:24 - That's what this series is all about, Hadoop, Hadoop, Hadoop,
00:01:27 - and big data and data in a variety of different formats.
00:01:29 - Hadoop is a distributed software solution.
00:01:33 - It's a distributed framework.
00:01:34 - It's a way that we can take a cluster of
00:01:36 - machines, a lot of machines.
00:01:38 - Rather than having one or couple big expensive machines,
00:01:40 - it's a way to have a lot of commodity machines, low to
00:01:43 - medium-range machines, that work together to store and
00:01:48 - process our data.
00:01:50 - We're going to kick off the Hadoop series introduction
00:01:52 - here with a look at the state of data.
00:01:54 - We'll get some statistics.
00:01:55 - We'll take a look at what this data explosion, another buzz
00:01:58 - term for you, what that's all about.
00:02:00 - It's really around unstructured data.
00:02:02 - The internet is growing at an alarming rate.
00:02:04 - We are entering data at an alarming rate as it's becoming
00:02:07 - more accessible.
00:02:08 - It's becoming a lot easier to enter data.
00:02:10 - Companies and websites all over are tracking
00:02:13 - everything we do.
00:02:14 - So data really is exploding.
00:02:16 - And again, it shows no signs of slowing down.
00:02:18 - We'll get some statistics.
00:02:19 - We'll get some use cases going of companies that currently
00:02:22 - have big data and what they're doing with it.
00:02:24 - And we'll talk about structured, unstructured, and
00:02:26 - semi-structured data and also the three V's, which are the
00:02:29 - three big challenges of big data, volume,
00:02:32 - velocity, and variety.
00:02:34 - From there we'll jump into a high-level overview of Hadoop.
00:02:37 - We'll get familiar with the core components.
00:02:39 - We'll talk about how Hadoop is scalable, fault-tolerant,
00:02:43 - flexible, fast, and intelligent.
00:02:45 - And we'll even get some comparisons going between the
00:02:47 - relational world and the unstructured world, and we can
00:02:50 - see how Hadoop really is a complement to that world.
00:02:52 - We'll even get some use cases going on internal
00:02:54 - infrastructure, such as Yahoo, who has a 4,500
00:02:57 - node cluster set up.
00:02:58 - They said they have 40,000 machines with Hadoop on it.
00:03:02 - So we'll see some of the specs of those machines and how
00:03:04 - Yahoo uses Hadoop.
00:03:06 - We'll also talk about cloud Hadoop.
00:03:08 - Everyone's got a cloud implementation these days,
00:03:10 - Amazon, Cloudera, Microsoft, Hortonworks, the list goes on
00:03:15 - and on, IBM.
00:03:16 - And we'll see how the New York Times used cloud-based Hadoop
00:03:20 - to turn all of their articles into PDFs in an incredible
00:03:24 - amount of time and an extremely low cost.
00:03:26 - At the end of this Nugget, we're going to look at the
00:03:28 - series layout.
00:03:28 - We'll start with the Nugget layout, so you can get an idea
00:03:31 - of what the 20 Nuggets in this series consist of.
00:03:33 - And then we'll look at the network layout.
00:03:36 - We're going to head over to the virtual Nugget Lab and
00:03:37 - create a cluster of four machines.
00:03:40 - We're going to spread gigabytes of data across those
00:03:42 - machines and use Hadoop's technology stack to manage and
00:03:46 - work with the data inside of our cluster.
00:03:48 - So strap on your seat belt.
00:03:49 - It's going to be a ride.
00:03:50 - It's going to be a fun ride.
00:03:51 - We're going to learn a lot.
00:03:52 - We're going to laugh.
00:03:53 - We're going to cry, probably cry a lot more than laugh.
00:03:55 - But you haven't truly worked with Hadoop until you've shed
00:03:58 - a few tears.
00:03:59 - Let's start with the current state of data.
00:04:01 - The state of data can be summed up in one word, a lot.
00:04:04 - There's a lot of data out there.
00:04:06 - And the mind-boggling stat that I still can't seem to
00:04:08 - wrap my brain around here is that 90% of the world's data
00:04:11 - was created in the last 2 years.
00:04:14 - That's a lot.
00:04:14 - And that says something.
00:04:15 - That alone tells me that, yeah, big data's here, and
00:04:19 - it's only going to get crazier.
00:04:22 - So that's why big data is really spawning off its own
00:04:25 - field in IT, and it's going to be a big market here in the
00:04:28 - next 5 to 10 years.
00:04:30 - So it's a great field to get into.
00:04:31 - There's going to be plenty of jobs.
00:04:32 - And there's already a serious lack of talent in the field.
00:04:35 - But 90% of the world's data created in last 2 years.
00:04:38 - A lot of that, sure, is due to everything being a lot more
00:04:41 - accessible with smartphones and tablets--
00:04:43 - anybody can access data from pretty much anywhere--
00:04:46 - but also the advent of social media.
00:04:48 - Social media is everywhere, Twitter, Facebook, Instagram,
00:04:52 - Tumblr, the list goes on and on.
00:04:54 - And so we're generating data at an alarming rate.
00:04:56 - Check this out.
00:04:57 - This is what we're talking about with the data explosion.
00:05:00 - Any time you hear that term data explosion, they're
00:05:01 - referring to the explosion of unstructured and
00:05:04 - semi-structured data on the web.
00:05:06 - Since the mid-'90s, it's been on a tear, this exponential
00:05:10 - rate that shows no signs of slowing down.
00:05:12 - Structured data over the last 40 years has been on a pretty
00:05:15 - standard manageable, predictable curve.
00:05:18 - And just to give some examples of these kinds of data,
00:05:20 - unstructured data, things like emails, PDFs, documents on the
00:05:24 - hard drive, textiles, that kind of stuff.
00:05:26 - Semi-structured are things that have some form of
00:05:29 - hierarchy or a way to delimit the data, like XML.
00:05:33 - XML is a tag-based format that describes the data inside of
00:05:36 - the tags, and it's hierarchical, so that's got
00:05:39 - some structure.
00:05:39 - Any sort of delimited-based file tab or a CSV kind of
00:05:43 - stuff, those are all semi-structured, because they
00:05:45 - don't have a hard schema attached to it.
00:05:47 - Structured data, in a relational world, everything
00:05:50 - as a schema associated with it.
00:05:52 - And it's checked against the scheme when you put it into
00:05:54 - the database.
00:05:55 - So what are some of these big companies
00:05:56 - doing with this data?
00:05:58 - And how do they do it?
00:05:59 - Well, Google, for instance, who seems to be the pioneers
00:06:01 - for everything, back in the day when they were starting
00:06:03 - out, they said, how do we make the internet searchable?
00:06:05 - We need to index a billion pages.
00:06:08 - They built this technology called MapReduce along with
00:06:11 - GFS, the Google File System.
00:06:14 - And that's really what Hadoop is based on.
00:06:16 - A gentleman by the name of Doug Cutting back in 2003
00:06:20 - joined Yahoo, and Yahoo gave him a team of engineers.
00:06:22 - And they built to Hadoop based on those white papers, the
00:06:26 - Google File System and MapReduce.
00:06:28 - That's the two core technologies in Hadoop are
00:06:30 - MapReduce and HDFS, the Hadoop Distributed File System.
00:06:35 - Back to the story here, Google indexed a billion pages using
00:06:38 - the same technology that we're going to learn about here.
00:06:41 - Now today 60 billion pages is what Google indexes to make
00:06:46 - searchable for the internet.
00:06:47 - And it still boggles my mind every time I do a Google
00:06:49 - search that it comes back in like 0.02 seconds.
00:06:52 - I'm like, how in the heck did it do that?
00:06:55 - Now we know, right?
00:06:56 - Facebook is another one.
00:06:57 - They boast they have the largest Hadoop
00:06:59 - cluster on the planet.
00:07:01 - They have a cluster that contains overall 100
00:07:03 - petabytes of data.
00:07:05 - And on top of that, they generate half a petabyte of
00:07:09 - data every single day.
00:07:11 - That's crazy.
00:07:12 - Anything everybody does on Facebook, from logging in to
00:07:16 - clicking to liking something is tracked on Facebook.
00:07:19 - And this is how they use that data to target you.
00:07:22 - They do ad targeting.
00:07:23 - Not that anybody clicks on Facebook ads, but that's how
00:07:25 - they ad target you is they look at what you're clicking,
00:07:28 - what your friends are clicking, find the
00:07:29 - commonalities between them, something called collaborative
00:07:32 - filtering, otherwise known as a recommendation engine.
00:07:35 - And that's how they do all of their ad targeting.
00:07:37 - Amazon's the same way.
00:07:38 - They do the exact same machine learning algorithms.
00:07:42 - They have these recommendation engines.
00:07:44 - Again, the fancy term for that is collaborative filtering.
00:07:46 - But any time you go to Amazon, if you buy something, or even
00:07:49 - if you just browse to a product, they look at all the
00:07:52 - other people that have also looked at that product, find
00:07:54 - the commonalities and then recommend them to you.
00:07:56 - It's a pretty brilliant and cool system.
00:07:59 - Twitter is another one, 400 million tweets a day, which is
00:08:03 - about 100,000 tweets a minute, which equates to about 15
00:08:09 - terabytes of data every day.
00:08:10 - What you do with that data if you're Twitter?
00:08:13 - Well, they have a lot of Hadoop clusters set up where
00:08:15 - they run thousands of MapReduce jobs every night,
00:08:17 - twisting and turning and looking at that data in a
00:08:18 - variety of ways to discover trends.
00:08:21 - They know all the latest and greatest trends, and they
00:08:23 - probably sell that information to people who make products so
00:08:26 - they can target those trends.
00:08:27 - Another interesting use case is GM, General Motors,
00:08:31 - American car manufacturer.
00:08:33 - Just recently, they cut off a multi-billion-dollar contract
00:08:36 - they had with HP and some other vendors for outsourcing.
00:08:40 - What are they doing?
00:08:41 - They're building two 20,000-square-feet warehouses.
00:08:45 - They're bringing it all in-house.
00:08:46 - They're going to load up those warehouses with miles and
00:08:49 - miles of racks containing low=end x86 machines.
00:08:53 - They're going to install Hadoop on them and do all
00:08:54 - their big data analytics inside.
00:08:56 - That is just awesome.
00:08:57 - And thanks to Jeremy Cioara, every time I see a picture of
00:08:59 - a data center now I just want to lick it.
00:09:01 - I don't know if you've seen his Nugget where he says,
00:09:03 - don't you just want to lick it?
00:09:04 - But thanks, Jeremy.
00:09:05 - It looks very lickable.
00:09:07 - And as if you need any more proof that this big data is
00:09:10 - more than just a fad, check out these stats.
00:09:12 - Global IP traffic-- and this is a Cisco stat-- is said to
00:09:15 - triple by 2015, triple, which they say will take us into the
00:09:20 - zettabyte range.
00:09:21 - Right now we're in the exabyte.
00:09:22 - I can't imagine being in the zettabytes.
00:09:24 - That is in unbelievable pile of data that's
00:09:28 - going to be out there.
00:09:29 - So big data is definitely here to stay.
00:09:31 - And also, 2/3 of North American companies that were
00:09:34 - interviewed said big data is in their five-year plan.
00:09:37 - So it's a good time to get into big data. re are going to
00:09:41 - be a ton of jobs in the future.
00:09:43 - There already are plenty of jobs out there for big data,
00:09:45 - but it's just going to expand exponentially as time goes on.
00:09:48 - The last thing I want to talk about on this slide are the
00:09:50 - three V's, volume, velocity, and variety, the three big
00:09:53 - reasons we cannot use traditional computing models,
00:09:57 - which is big expensive tricked out machines with lots of
00:10:01 - processors that contain lots of cores, and we have lots of
00:10:03 - hard drives, rate enabled for
00:10:05 - performance and fault tolerance.
00:10:07 - It's a hardware solution.
00:10:08 - We can't use hardware solutions.
00:10:09 - I'll explain why in a minute.
00:10:10 - And also, a reason we cannot use the relational world.
00:10:13 - The relational world was really designed to handle
00:10:14 - gigabytes of data, not terabytes and petabytes.
00:10:17 - And what does the relational world do when they start
00:10:18 - getting too much data?
00:10:19 - They archive it to tape.
00:10:21 - That is the death of data.
00:10:23 - It's no longer a part of your analysis.
00:10:25 - It's no longer a part of your business
00:10:27 - intelligence and your reporting.
00:10:29 - And that's bad.
00:10:30 - Velocity is another one.
00:10:31 - Velocity is the speed at which we access data.
00:10:34 - The traditional computing models, no matter how fast
00:10:37 - your computer is, your processor is still going to be
00:10:39 - bound by disk I/O, because disk transfer rates haven't
00:10:42 - evolved at nearly the pace of processing power.
00:10:45 - This is why distributed computing makes more sense.
00:10:47 - And Hadoop uses the strengths of the current computing world
00:10:52 - by bringing computation to the data.
00:10:54 - Rather than bringing data to the computation, which
00:10:57 - saturates network bandwidth, it can
00:10:59 - process the data locally.
00:11:01 - And when you have a cluster of nodes all working together
00:11:05 - using and harnessing the power of the processor and reducing
00:11:09 - network bandwidth and mitigating the weakness of
00:11:12 - disk transfer rates, we have some pretty impressive
00:11:16 - performance when processing.
00:11:18 - In fact, Yahoo broke records of processing and sorting a
00:11:22 - terabyte of data multiple times using Hadoop.
00:11:25 - In fact, they did it back in 2008.
00:11:27 - The record at the time was 297.
00:11:29 - They smashed it in 209 seconds on a 900-node Hadoop cluster,
00:11:34 - just a bunch of commodity machines with 8 gigs of RAM,
00:11:37 - dual quad-core processors and 4 disks attached to each one,
00:11:40 - pretty impressive stuff.
00:11:41 - The last V, variety, obviously relational systems can only
00:11:44 - handle the structured data.
00:11:45 - Sure, they can get semi-structured and
00:11:47 - unstructured data in with some engineers that have ETL tools
00:11:50 - that'll transform and scrub the data to bring it in.
00:11:53 - But that requires a lot of extra work.
00:11:56 - So let's go get interested to Hadoop and see how it solves
00:11:58 - these three challenges and talk about it's a
00:12:00 - software-based solution and all the benefits we gain with
00:12:03 - that over the hardware-based solution of the traditional
00:12:05 - computing world.
00:12:06 - As I mentioned earlier, Hadoop is this
00:12:08 - distributed software solution.
00:12:10 - It is a scalable fault-tolerant distributive
00:12:12 - system for data storage and processing.
00:12:16 - There's two main components in Hadoop, HDFS, which is the
00:12:19 - storage, and MapReduce, which is the retrieval and the
00:12:21 - processing.
00:12:23 - HDFS is this self-healing
00:12:27 - high-bandwidth clustered storage.
00:12:29 - And it's pretty awesome stuff.
00:12:31 - What would happen here is if we were to put a petabyte file
00:12:33 - inside of our Hadoop cluster, HDFS would break it up into
00:12:36 - blocks and then distribute it across all the
00:12:38 - nodes in our cluster.
00:12:40 - On top of that-- and this is where the fault tolerance side
00:12:42 - is going to come in a play--
00:12:44 - when we configure HDFS, we're going to set up
00:12:45 - a replication factor.
00:12:47 - By default, it's set at 3.
00:12:48 - What that means is when we put this file in Hadoop, it's
00:12:50 - going to make sure that there are three copies of every
00:12:53 - block that make up that file spread across all the nodes in
00:12:56 - the cluster.
00:12:57 - That's pretty awesome.
00:12:58 - And why that's awesome is because if we lose a node,
00:13:03 - it's going to self-heal.
00:13:04 - It's going to say, oh, I know what data was on that node.
00:13:07 - I'm just going to re-replicate the blocks that were on that
00:13:09 - node to the rest of the servers inside of the cluster.
00:13:13 - And how it does it is this.
00:13:14 - It has a NameNode and a DataNode.
00:13:17 - Generally, you have one NameNode per cluster, and then
00:13:19 - all the rest of these here going to be DataNodes.
00:13:21 - And we'll get into more details of the roles and
00:13:24 - secondary NameNode nodes and all that
00:13:25 - stuff when we get there.
00:13:26 - But essentially, the NameNode is just a metadata server.
00:13:28 - It just holds in memory the location of every block and
00:13:31 - every node.
00:13:32 - And even if you have multiple racks set up, it'll know where
00:13:35 - blocks exist on what note on what rack spread across the
00:13:38 - cluster inside your network.
00:13:40 - So that's the secret sauce behind HDFS, and that's how
00:13:44 - we're fault-tolerant and redundant, and it's just
00:13:46 - really awesome stuff.
00:13:47 - Now how we get data is through MapReduce.
00:13:50 - And as the name implies, it's really a two-step process.
00:13:53 - There's a little more to that.
00:13:53 - But again, we're going to keep this high level.
00:13:55 - So we'll get into MapReduce.
00:13:56 - We've got a few Nuggets on MapReduce.
00:13:57 - We'll get in down to the nitty-gritty details, and
00:14:00 - we'll also break out Java and write some
00:14:02 - MapReduce jobs on our own.
00:14:03 - But it's a two-step process at the surface.
00:14:06 - There's a mapper and a reducer.
00:14:08 - Programmers will write the mapper function, which will go
00:14:12 - out and tell the cluster what data
00:14:14 - points we want to retrieve.
00:14:16 - The reducer will then take all that data and aggregate it.
00:14:20 - So again, Hadoop is a batch-processing-based system.
00:14:23 - And we're working on all of the data in the cluster.
00:14:26 - We're not doing any seeking or anything like that.
00:14:28 - Seeks are what slows down data retrieval.
00:14:31 - MapReduce is all about working on all of the data inside of
00:14:34 - our cluster.
00:14:35 - And MapReduce produce can scare some folks away, because
00:14:38 - you think, oh, I got to know Java in order to write these
00:14:41 - Javas to pull data out of the cluster.
00:14:42 - Well, that's not entirely true.
00:14:44 - A lot of things have popped up in the Hadoop ecosystem over
00:14:47 - the last couple of years that attract many people.
00:14:50 - And this is where the flexibility comes into play,
00:14:52 - because you don't need to understand Java to get data
00:14:55 - out of the cluster.
00:14:56 - In fact, the engineers at Facebook built a subproject
00:15:00 - called Hive, which is a SQL interpreter.
00:15:02 - Facebook said, you know what?
00:15:03 - We want lots of people to be able to write ad hoc jobs
00:15:07 - against our cluster, but we're not going to force everybody
00:15:10 - to learn Java.
00:15:12 - So that's why they had a team of engineers build Hive.
00:15:14 - And now anybody that's familiar a SQL, which most
00:15:16 - data professionals are, can now pull
00:15:18 - data out of the cluster.
00:15:20 - Pig is another one.
00:15:21 - Yahoo went and built Pig as a high-level dataflow language
00:15:25 - to pull data out of a cluster.
00:15:26 - And all Hive and Pig both do are under the hood create
00:15:29 - MapReduce jobs and submit them to the cluster.
00:15:31 - That's the beauty of an open source framework.
00:15:33 - People can build and add to it.
00:15:35 - Our community keeps growing in Hadoop.
00:15:37 - More the technologies and projects are added to Hadoop
00:15:40 - ecosystem all the time, which are just making it more
00:15:42 - attractive to more and more folks.
00:15:45 - Again, as these technologies merge and Hadoop matures,
00:15:48 - you're going to see it become a lot more attractive to just
00:15:51 - the big businesses.
00:15:52 - Small, medium-sized businesses, people of all
00:15:55 - types of industries are going to jump into Hadoop and start
00:15:58 - mining all kinds of data in their network.
00:16:00 - All right.
00:16:00 - So we're fault-tolerant through HDFS.
00:16:02 - We're flexible in how we can retrieve the data.
00:16:05 - And we're also flexible in the kind of data
00:16:07 - we can put in Hadoop.
00:16:08 - As we saw, structured, unstructured, semi-structured,
00:16:10 - we can put it all in there.
00:16:11 - But we're also scalable.
00:16:13 - The beauty of scalability is it just kind of happens by
00:16:15 - default because we're in the distributed computing
00:16:17 - environment.
00:16:18 - We don't have to do anything special
00:16:19 - to make Hadoop scalable.
00:16:20 - We just are.
00:16:21 - Let's say our MapReduce job starts slowing down because we
00:16:23 - keep adding more data into the cluster.
00:16:25 - What do we do?
00:16:26 - We add more nodes, which increases the overall
00:16:29 - processing power of our entire cluster, which is pretty
00:16:32 - awesome stuff.
00:16:33 - And adding nodes is really a piece of cake.
00:16:34 - We just install the Hadoop binaries, point them to the
00:16:37 - NameNode, and we're good to go.
00:16:39 - Last but not least, Hadoop is extremely intelligent.
00:16:41 - We've already seen examples of this.
00:16:43 - The fact that we're bringing computation to the data, we're
00:16:45 - maximizing the strengths of today's computing world and
00:16:48 - mitigating the weaknesses, that alone is
00:16:50 - pretty awesome stuff.
00:16:51 - But on top of that, in a multi-rack environment--
00:16:54 - let's get some switches up here.
00:16:55 - Here's some rack switches, and here's
00:16:58 - our data center switch--
00:17:01 - it's rack-aware.
00:17:02 - This is something that we need to do manually.
00:17:04 - We need to configure this.
00:17:05 - And it's pretty simple to do.
00:17:06 - In a configuration file, we're just describing the network
00:17:08 - topology to Hadoop so it knows what DataNodes
00:17:11 - belong to what racks.
00:17:13 - And what that allows Hadoop to do is even more data locality.
00:17:18 - So whenever it receives a MapReduce job, it's going to
00:17:21 - find the shortest path to the data as possible.
00:17:23 - If most of the data is on one rack and only a little bit on
00:17:26 - another rack, then it can get most of the
00:17:27 - data from one rack.
00:17:28 - And, again, this is where it's going to save on bandwidth.
00:17:31 - It's going to save on bandwidth, because it's going
00:17:32 - to keep data as local to the rack as possible.
00:17:35 - Pretty awesome stuff.
00:17:36 - Let's check out a couple of use cases, one from an
00:17:39 - architectural standpoint, Yahoo.
00:17:41 - Yahoo has over 40,000 machines, as I mentioned, with
00:17:44 - Hadoop on it.
00:17:44 - Their largest cluster sits at a 4,500-node cluster.
00:17:48 - Each node inside of that cluster has a dual quad-core
00:17:50 - CPU, 4 one-terabyte disks, and 16 gigabytes of RAM, pretty
00:17:55 - good size for a commodity machine, but certainly not an
00:17:58 - extremely high-end machine that we're talking about in
00:18:00 - the traditional committing sense.
00:18:01 - That's not bad at all.
00:18:04 - Another use case here is the cloud.
00:18:06 - The cloud.
00:18:07 - What about Hadoop in the cloud?
00:18:08 - Lots and lots of companies running Hadoop implementations
00:18:12 - in the cloud, Amazon being one of the more popular ones out
00:18:14 - there, in that instead of Amazon Web Services they have
00:18:17 - something called EMR, Elastic MapReduce, which is just their
00:18:21 - own implementation of Hadoop.
00:18:22 - You can literally get it up and running in five
00:18:24 - minutes in the cloud.
00:18:25 - And that's going to be attractive for a lot of
00:18:27 - businesses that can't afford an internal infrastructure,
00:18:30 - such as Yahoo or GM, as we saw earlier.
00:18:32 - For instance, here's a good one, New York Times.
00:18:36 - The New York Times wanted to convert all of their articles
00:18:41 - to PDFs, four terabytes of articles into PDFs.
00:18:45 - How do you do that?
00:18:46 - How do you do that in a cost-effective way without
00:18:49 - buying an entire infrastructure or an entire
00:18:53 - army of engineers to implement that infrastructure or
00:18:56 - developers and all that good stuff?
00:18:58 - Here's how they did it.
00:18:59 - They fired up an AWS EC2 instance.
00:19:02 - They used S3.
00:19:03 - They put their four terabytes of TIFF data inside of S3.
00:19:06 - They spun up an EC2 instance and then just ran a MapReduce
00:19:11 - job to take those four terabytes,
00:19:13 - convert them into PDFs.
00:19:14 - It happened in less than 24 hours, and it cost them a
00:19:17 - grand total of $240.
00:19:20 - I know.
00:19:20 - That's crazy.
00:19:21 - The first time I saw it too, I was like, aren't they missing
00:19:23 - a few zeroes and commas and decimals and things that would
00:19:27 - make that number bigger?
00:19:29 - But believe it or not, that's all it is.
00:19:31 - And that's a beautiful thing.
00:19:32 - In fact, the cloud's really attractive, not only for small
00:19:35 - businesses but also for hobbyists or people that are
00:19:37 - trying to learn.
00:19:38 - Because you only pay for compute time.
00:19:40 - So while you're developing and learning this stuff, if you
00:19:42 - ever want to run it, spin it up, run it, spin it down.
00:19:45 - It'll cost you cents.
00:19:47 - Cents.
00:19:48 - So the cloud's huge, and it's attractive.
00:19:50 - And we're going to look later in the series how to get
00:19:53 - Hadoop up and running in the cloud.
00:19:54 - But first, we need to do it the hard way.
00:19:56 - So we're going to spend the first half of the series doing
00:19:58 - everything down low level at the ground.
00:20:02 - We'll see how to do this stuff the hard way, and then we'll
00:20:05 - look at how to do it the easy way in the cloud.
00:20:06 - Lastly here, let's get a look at what's to come.
00:20:09 - So starting with the Nugget layout, you are here, the
00:20:12 - series introduction.
00:20:14 - From here, we're going to move into the technology stack.
00:20:16 - We'll get familiar with all of the projects that make up the
00:20:18 - Hadoop ecosystem.
00:20:19 - We'll look at a high level, just to get familiar and make
00:20:23 - some sense out of it.
00:20:24 - We'll look at Pig, Hive, HBase, Sqoop, ZooKeeper,
00:20:27 - Ambari, Avro, Flume, Oozie.
00:20:29 - Yeah, you see where I'm going with this?
00:20:30 - There's a lot of them.
00:20:32 - And we'll look at them all.
00:20:33 - We'll make sense of it.
00:20:34 - And believe me, it'll be a good exercise in what the heck
00:20:37 - is Hadoop and what are all these things around it and
00:20:40 - what are their roles and responsibilities and how are
00:20:43 - they going to make it easier to work with our cluster and
00:20:45 - the data within it.
00:20:46 - So we'll take the covers off all that next.
00:20:48 - Then we'll jump into HDFS.
00:20:50 - We'll get a good hard look at the internals of the Hadoop
00:20:54 - Distributed File System.
00:20:56 - From there, we'll learn how to install Hadoop.
00:20:59 - We're going to do this first on a single node.
00:21:02 - So we've got a video dedicated to single-node installation.
00:21:05 - And then we have a video dedicated to multi-node
00:21:08 - installation.
00:21:09 - So you're going to get very familiar with how to get
00:21:11 - Hadoop up and running from scratch.
00:21:13 - Then from there, we'll jump into another HDFS Nugget and
00:21:16 - learn how to configure it and manage HDFS
00:21:19 - inside of our cluster.
00:21:20 - Then we'll get into MapReduce.
00:21:21 - We've got a couple of Nuggets on MapReduce.
00:21:23 - We want to introduce you to it and to give a basic
00:21:25 - introduction.
00:21:26 - And then it's just going to be straight into it.
00:21:29 - I'm going to throw you right into it by learning how to
00:21:31 - develop MapReduce applications using Java.
00:21:33 - Then we'll get familiar with many of the big popular
00:21:39 - projects here in the Hadoop ecosystem.
00:21:41 - We've got a couple of Nuggets on each one of these, Hive,
00:21:43 - Pig, Hive, HBase.
00:21:45 - Sqoop and ZooKeeper we only need one Nugget for, because
00:21:47 - they're not extremely huge and have a lot of concepts
00:21:49 - associated with them.
00:21:50 - But Pig, Hive, and HBase, we've got a couple Nuggets
00:21:52 - dedicated to each of those so you can get familiar with what
00:21:55 - they are and how to use them.
00:21:56 - We've got a Nugget on troubleshooting Hadoop.
00:21:58 - And then we've got a few nuggets on cloud Hadoop.
00:22:00 - We'll take a couple of different looks at how to get
00:22:02 - Hadoop up and running in the cloud.
00:22:04 - By the end of this series, we'll have 20 Nuggets of
00:22:07 - Hadoop goodness that touches on a little bit of everything.
00:22:10 - Lastly, let's check out our network layout.
00:22:11 - We're going to create a four-node cluster up in the
00:22:14 - virtual Nugget Lab.
00:22:15 - All of these machines are going to be running Ubuntu
00:22:17 - Linux 12.04 LTS for long-term support.
00:22:21 - How Hadoop's versioning works, we're going to use the most
00:22:24 - recent stable release, which is 1.12.
00:22:26 - But essentially, 1.1.x are the stable releases, 1.2.x are the
00:22:30 - beta releases, and 2.x.x are the alpha releases.
00:22:34 - So we'll stick with the most recent stable release and also
00:22:37 - Java 7, which is 1.7.
00:22:39 - And again, I have Ubuntu Linux installed on these four
00:22:42 - machines, but that's it.
00:22:43 - That's as far as I took it.
00:22:44 - We're going to take these machines
00:22:46 - literally from scratch.
00:22:47 - I'll show you how we can download the Hadoop tar ball,
00:22:50 - get it all installed, configured at
00:22:53 - the single-node level.
00:22:54 - And then we'll do it again at the multi-node level.
00:22:56 - How it's going to work here is we're going to name these
00:22:59 - Hadoop Nugget, HN, Names.
00:23:02 - So there's our NameNode.
00:23:03 - And then we're going to have a bunch of
00:23:04 - Hadoop Nugget DataNodes.
00:23:06 - So DataNode 1, HN Data 1, HN Data 2, and HN Data 3.
00:23:11 - Once we get our cluster set up, we get HDFS configured,
00:23:14 - then we'll get some data into here.
00:23:16 - We'll start with unstructured data.
00:23:18 - I'll show you how we can take books.
00:23:19 - Books are a great way to learn on, because it's easy to write
00:23:23 - MapReduce jobs to mine books for word counts.
00:23:26 - Show me how many times words appear in a book, and you can
00:23:29 - see what your favorite authors' favorite words are
00:23:33 - across all of the books that they've ever written.
00:23:35 - Kind of cool.
00:23:36 - In fact, if you could data mine Nuggets, you could
00:23:37 - probably find that my favorite word is either cool or ah or
00:23:41 - something like that, probably a bunch of filler words.
00:23:43 - But anyway, the fun stuff.
00:23:45 - So we'll start with unstructured data.
00:23:46 - Then we'll move into more structured data.
00:23:48 - We'll find some good data out there that we can use.
00:23:50 - I'll show you where you can find big data sets.
00:23:52 - Amazon offers some.
00:23:53 - Infochimps offers some.
00:23:55 - So we'll definitely get some good data in
00:23:56 - here to work with.
00:23:57 - And then we'll use all the tools that we learned along
00:23:59 - the way to see how to manage the cluster and work with the
00:24:03 - data inside of it.
00:24:04 - It'll be fun.
00:24:05 - In this CBT Nugget, we took a Hadoop series introduction.
00:24:09 - We started off by talking about the state of data.
00:24:11 - We defined big data.
00:24:13 - We saw the challenges with big data.
00:24:14 - We saw how companies use big data for analytics.
00:24:17 - And we had some fun statistics along the way.
00:24:20 - We also got familiar with Hadoop, just did a basic
00:24:23 - high-level overview to introduce you to the core
00:24:25 - components that are Hadoop, HDFS, MapReduce.
00:24:28 - We saw how it's a pretty impressive software-based open
00:24:32 - source solution to distributed computing that's scalable,
00:24:36 - fault-tolerant, fast, and flexible.
00:24:39 - And at the end here, we took a look at the series layout.
00:24:40 - We got familiar with the Nuggets in this series and
00:24:43 - really what we're going to be learning about throughout this
00:24:45 - series and also the network layout to get familiar with
00:24:47 - what we're going to be working with over in the virtual
00:24:49 - Nugget Lab and the kinds of things we'll be doing.
00:24:51 - I hope this has been informative for you, and I'd
00:24:52 - like to thank you for viewing.

Hadoop Technology Stack

Hadoop Distributed File System (HDFS)

Introduction to MapReduce

Installing Apache Hadoop (Single Node)

Installing Apache Hadoop (Multi Node)

Troubleshooting, Administering and Optimizing Hadoop

Managing HDFS

MapReduce Development

Introduction to Pig

Developing with Pig

Introduction to Hive

Developing with Hive

Introduction to HBase

Developing with HBase

Introduction to Zookeeper

Introduction to Sqoop

Local Hadoop: Cloudera CDH VM

Cloud Hadoop: Amazon EMR

Cloud Hadoop: Microsoft HDInsight

This forum is for community use – trainers will not participate in conversations. Share your thoughts on training content and engage with other members of the CBT Nuggets community. For customer service questions, please contact our support team. The views expressed in comments reflect those of the author and not of CBT Nuggets. We reserve the right to remove comments that do not adhere to our community standards.

comments powered by Disqus
Community Standards

We encourage you to share your wisdom, opinions, and questions with the CBT Nuggets community. To keep things civil, we have established the following policy.

We reserve the right not to post comments that:
contain obscene, indecent, or profane language; contain threats or defamatory statements; contain personal attacks; contain hate speech directed at race, color, sex, sexual orientation, national origin, ethnicity, age, religion, or disability; contributes to a hostile atmosphere; or promotes or endorses services or products. Non-commercial links, if relevant to the topic, are acceptable. Comments are not moderated, however, all comments will automatically be filtered for content that might violate our comment policies. If your comment is flagged by our filter, it will not be published.

We will be continually monitoring published comments and any content that violates our policies will be removed. Users who repeatedly violate our comments policy may be prohibited from commenting.
Garth Schulte

Garth Schulte

CBT Nuggets Trainer


Area of Expertise:
Visual Studio 6, Visual Studio.NET Windows/Web Programming, SQL Server 6.5-2012

Course Features

Speed Control

Play videos at a faster or slower pace.


Pick up where you left off watching a video.


Jot down information to refer back to at a later time.

Closed Captions

Follow what the trainers are saying with ease.

MP3 Downloads

Listen to videos anytime, anywhere

Annual Course Features

Virtual Lab

Use a virtual environment to reinforce what you are learning and get hands-on experience.
Available only with the annual subscription.
Your browser cannot access Virtual Labs
Add training to a playlist
or create a new list
Add to current playlist
or add to an existing list
Add to new playlist