00:00:00 - And welcome to Nugget 103.7. Now this
00:00:05 - time we're going to learn a few things. We're actually going to use
00:00:07 - the tool that you've been using all along.
00:00:11 - We're going to learn to use grep. Now you've had on the job training for the
00:00:14 - past few nuggets all the way through this session, but we're
00:00:17 - going to learn about it officially. We're also going to learn
00:00:20 - about fgrep and egrep,
00:00:24 - which are obviously related to grep but separate tools in and
00:00:28 - of themselves. And then the majority of this section is going
00:00:33 - to be about the mighty regex.
00:00:38 - Now regular expressions or regex, as it's often called, are
00:00:42 - very, very useful ways to sort through text and pick and
00:00:48 - find and sort, analyze and -- regex is awesome stuff. It's
00:00:52 - also kind of confusing, so we're going to take time. At first we're just
00:00:56 - going to learn some simple regular expression tools.
00:01:00 - But I want you to really pay attention this nugget. It has a weight
00:01:03 - of two, but it will be invaluable to you as a Linux user,
00:01:08 - all right. So let's get down to business. Let's get grep,
00:01:10 - fgrep, and egrep -- we'll add one more to that, too,
00:01:15 - sed. We'll get those out of the way and then we'll spend the
00:01:17 - bulk of the time on regular expressions. So let's get started.
00:01:22 - Okay, first I want to start with the playing field. See, we're
00:01:25 - back in our handy dandy little terminal here, and let's
00:01:29 - see. I've made two files, so we'll look at file.txt and
00:01:34 - we'll look at file2.txt. And these are just some text
00:01:39 - files that I added a bunch of words to, see. So file two has these in it,
00:01:43 - file has these in it. All right, just so you know that that's what's
00:01:47 - in there and that's when I start working with grep and such.
00:01:50 - We're going to be manipulating those files --
00:01:53 - and apparently my speech. So let's clear the screen and we'll
00:01:57 - start with grep. Now there's two ways that you can invoke
00:02:00 - grep. What you can do is you can say grep and then what it
00:02:04 - is you're looking for. So we know, if you remember those, there
00:02:08 - was a couple words with two Os in it. So we'll grep oo from -- not
00:02:13 - fire -- from file.txt, all right. So basically we start with
00:02:17 - the grep, then what it is we're grepping for and where we're grepping
00:02:21 - from. And what we got out of there were these four words.
00:02:26 - It prints the whole line and it shows what was in, you know, what
00:02:29 - lines contained oo. Now you can do some things like
00:02:33 - grep -n to show line numbers.
00:02:37 - oo from file.txt, and there it tells you what line they're in.
00:02:41 - If you're working with code, sometimes that's really convenient.
00:02:44 - All right, but right there and you can also, let's see. Now grep
00:02:49 - is case sensitive. So if we were going to look for
00:02:55 - b-o-o from file.txt, it's not going to find the capital
00:03:00 - books. See up here that has a capital B. But grep has a flag for
00:03:04 - that, too -- it's I, which means case insensitive.
00:03:08 - Search for boo,
00:03:10 - file.txt, and see there it found both of these because it didn't
00:03:14 - care about what case it was, if it was upper case or lower case, all right. So that's
00:03:17 - how grep works. But if you remember, in the past few nuggets we've
00:03:21 - been using grep a different way. What we've been doing is piping
00:03:24 - the cont -- or the results of a command into grep. So
00:03:29 - let's say we were to type ls. You'd get these two files, right.
00:03:32 - If you type ls and then you use the pipe symbol
00:03:37 - into the command,
00:03:39 - grep for i-l-e all right. What we should get is both of those as
00:03:45 - a result. So instead of using a file
00:03:49 - for it to grep for boo or for whatever here, we've used the results
00:03:54 - of ls, which we know that those are the results of ls,
00:03:58 - right, file2 and file, and then we've used pipe to make that the input
00:04:03 - for grep. We learned about this in a previous nugget, but this is
00:04:06 - a way that you often use grep, all right. And then we grepped for i-l-e
00:04:11 - out of the results and we got both of them because both file names
00:04:14 - include i-l-e, all right. So you'll see me use these interchangeably. You know,
00:04:19 - we could also do the same thing that we did up here, if we were
00:04:22 - to say, cat file.txt. See, if you do that, you know that's
00:04:26 - what it contains. So if we did cat file.txt, piped it into grep oo,
00:04:32 - we would get the same results, see. Just as if we invoked
00:04:36 - it saying, use this file; it was the result of using cat on that
00:04:40 - file and then it grepped the results from there. That's a really
00:04:43 - commonly way that you use grep and I just wanted to show you that before
00:04:47 - we get too deep, so if I start doing things like using pipe it's not
00:04:50 - confusing. All right, so grep is one that we're familiar with and I want
00:04:54 - to show you some of the regular expressions that you can use
00:04:57 - with grep.
00:04:59 - There are some special characters that you can use inside
00:05:03 - in grep and they're called regular expression characters.
00:05:07 - Just, they're special characters. For example, let's say that we
00:05:11 - wanted to find all of the
00:05:16 - words that -- oh, let's see. In this, here's a good example. This
00:05:21 - is our file.txt, see these words. Now let's search
00:05:26 - for all the words that have p-l-e in them, all right. So we're going to grep
00:05:33 - p-l-e from file.txt. Now what do you think we'll
00:05:36 - find? We should have two results, right. Purple, p-l-e there; and
00:05:41 - plenty, p-l-e there. But what if we just wanted to search for
00:05:45 - words that started with p-l-e? Well that's possible, that's
00:05:49 - where regular expressions come into play. You would type something like grep
00:05:53 - and then the carrot symbol, shift six on the regular U.S. Keyboard.
00:05:58 - That signifies the beginning of a line. And then
00:06:03 - p-l-e file.txt.
00:06:06 - And it only showed plenty, not purple, because this is a regular
00:06:11 - expression that says, only at the beginning of a line. So it has to
00:06:14 - be the beginning of a line and then p-l-e, and it only found that one time,
00:06:18 - in the word plenty, all right. Now there is the same thing for end
00:06:22 - of the, end of the line. So our end of the line, grep,
00:06:28 - let's say
00:06:30 - s and end of the line, the dollar sign, okay.
00:06:37 - File.txt. Now what are we going to find? Well it found three of the words
00:06:42 - ended in s, see. So it found blades, books, and Books. And if we
00:06:47 - look, let's test it. Let's see how well it did.
00:06:52 - The whole file, there should only three that end in s. And sure
00:06:54 - enough, just these three -- blades, books, and Books all end in s.
00:06:59 - So s and then the end of the line character, all right. So that's what that dollar
00:07:04 - sign means. It's a regular expression showing end.
00:07:08 - All right, does that make sense?
00:07:11 - Now the period means any character. It can be any character.
00:07:15 - It has to be a character, but it can be any character. So let's say we want
00:07:19 - to look for any character and o from file.txt.
00:07:26 - Okay, it found all these because all these had something and then a O.
00:07:30 - Let's see if we can figure out something.
00:07:35 - How about
00:07:37 - grep any character p-l-e from file.txt. What
00:07:43 - do you think is going to happen?
00:07:46 - It found purple, but it didn't find plenty. And why didn't it find plenty?
00:07:50 - Well because if you look, there's no character before the p-l-e
00:07:53 - in plenty, right. This means any character, but it has to be
00:07:57 - a character.
00:07:59 - Now to get much more complicated with grepping, you're going
00:08:02 - to have to use a different tool, because grep doesn't understand
00:08:06 - all of the regular expressions that you can create. For that
00:08:09 - you need egrep, which is extended grep. Now you could also use
00:08:14 - grep -E, but since you use extended grep
00:08:18 - quite a bit, it's easier just to use egrep as a command all on its own.
00:08:23 - And the only difference, now it'll do all the things that grep will
00:08:25 - do. See, if we do grep.ple file.txt, it's going
00:08:30 - to show us the same result -- purple. What egrep will allow you to
00:08:34 - do, though, is string different regular expressions together
00:08:37 - and use some things like or and and and things like that. So let's do
00:08:42 - something kind of complicated, all right. Let's type egrep, and now
00:08:48 - this is going to be a regular expression with some spaces in between
00:08:52 - or you know some complicated things, so we're going to put it in
00:08:54 - quotes. Just it's good to put it in quotes or single quotes
00:08:58 - if you're afraid that it's going to be the name of a variable.
00:09:01 - In fact, let's do single quotes. That way, after we
00:09:04 - use the name of a variable, it won't be substituted. It'll just
00:09:07 - use literally what we put in there. So we're going to search for
00:09:11 - something that begins with either b
00:09:18 - or d. Now I'll go through this in a second. B or d, close the single
00:09:24 - quotes, in file.txt. So what we have here, we put
00:09:29 - it in single quotes just because it was an expression, it's just
00:09:33 - what we're searching for.
00:09:36 - The carrot symbol means the beginning of a line
00:09:39 - and then inside these parentheses is where we're giving a choice.
00:09:43 - So it's going to be b, and then the pipe symbol inside a
00:09:47 - regular expression means or, d; close parentheses. So it's going
00:09:52 - to evaluate that. So it's going to have to begin the line with
00:09:55 - either b or d. File.txt is what we're searching in. So we hit
00:10:00 - Enter, and it's going to show us all of the words that started
00:10:04 - with either b or d. See, so that's kind of, kind of nice. But what
00:10:10 - if we didn't want blades? So then we can string things together.
00:10:15 - Egrep, type the same sort of thing because we want it all the
00:10:19 - words that begin with
00:10:21 - b or d,
00:10:25 - and then we're going to say, and then after that have
00:10:34 - o-o. Single quotes,
00:10:38 - file.txt. And so now it found all of the letters that
00:10:44 - start -- or all of the words in -- or all the lines that start with b or
00:10:48 - d, and then immediately have o-o after them. See how we've
00:10:53 - strung those regular expressions together. Now it gets kind of complicated.
00:10:57 - You could also do a range. So instead of b or d, let's say
00:11:03 - that we want --
00:11:07 - you can actually use grep for a range. I believe that it's safe to
00:11:11 - use egrep.
00:11:13 - And we're going to search for all the words that start -- now
00:11:18 - to do a range, you're going to open a bracket here, okay, the
00:11:21 - square brackets. So all of them that start with anywhere from
00:11:26 - a to k.
00:11:30 - Oh, but what did I forget. I want it to start, the line to start
00:11:34 - with that. Any letters a through k
00:11:38 - in file.txt. So what do we have here. Again, single quotes,
00:11:42 - we just put it inside single quotes, and we're going to search for
00:11:46 - any word, the line that begins with any range between a and k.
00:11:51 - So a, b, c, d, e, f, g, h, i, j, k in file.txt. And what do
00:11:56 - we get. We got blade, book, books, kitten, doogy, and fast with the
00:12:02 - dollar sign in there, okay. So all of those, are all those in between a and
00:12:07 - k? Yes. Now what is missing?
00:12:10 - Well if you, let's
00:12:12 - look. Cat file.txt.
00:12:14 - Now it didn't find this Books because it's a capital. We
00:12:20 - just did the lower case range, all right.
00:12:24 - Let's get super crazy. Let's start nesting. This will be the last
00:12:28 - one we do on egrep here because we're going to get crazy.
00:12:31 - Now there are other, other regular expressions but as long as
00:12:34 - you can string together some craziness here, you should be,
00:12:36 - you should be good. Let's do egrep,
00:12:40 - open with the single parentheses here, something that begins with
00:12:45 - either
00:12:49 - a through k or
00:12:54 - A through K.
00:12:58 - See what I did? See the difference there?
00:13:01 - Single quote. File.txt.
00:13:04 - So now what did it do? It did the same thing it did last time,
00:13:10 - except it also found this because we said -- again, now this is really
00:13:15 - complicated, but we said what begins with and it would probably --
00:13:22 - we could also --
00:13:25 - if you like this better, the way it, if it makes it more clear.
00:13:31 - You could do it that way and it does the same thing. So again, for
00:13:35 - everybody trying to keep track here, it's saying the begins
00:13:39 - with, so the line begins with, and then match up these parentheses,
00:13:43 - everything inside here that is either a through k lower case.
00:13:48 - The pipe symbol means or, A through K upper case. And then we
00:13:54 - told it file.txt. So see how we keep stringing these things
00:13:57 - together with egrep. We have a lot more, extended grep
00:14:00 - gives us a lot more things like ranges and the or symbol and that
00:14:04 - sort of thing. So that's how you string regular expressions
00:14:07 - together. And then lastly,
00:14:10 - we have fgrep. Now fgrep is much different.
00:14:15 - Fgrep is fast grep, it's often called fast grep, and the
00:14:19 - reason is because let's say we wanted to search for --
00:14:23 - do you remember way back in the beginning we searched for the words
00:14:26 - that end
00:14:29 - in -- where's my dollar sign --
00:14:32 - end in s file.txt. Actually here, let me, just to refresh your
00:14:37 - memory here. So if we do this, it's going to show us all the
00:14:40 - words that end in s. Because remember the dollar sign is
00:14:43 - the regular expression symbol for the end of the line, right.
00:14:47 - But if we did that same thing with fgrep,
00:14:52 - s and dollar sign file.txt, it's not going to find anything.
00:14:58 - Because fast grep doesn't recognize regular expressions
00:15:02 - at all. Where fgrep is really useful is where you don't want regular
00:15:06 - expressions to be evaluated. For example, let's say we're trying
00:15:10 - to search for f-a-s
00:15:14 - in file.txt. Again, we're trying to find this
00:15:18 - line, right, that's what we're trying to do. Well if we use regular grep, it's
00:15:22 - not going to show anything. Because remember, it's evaluating
00:15:24 - this as, find me any line that has f-a as the last line or
00:15:30 - as the last character in the line, and that's not what we're
00:15:33 - looking for. If you use fgrep,
00:15:36 - f-a
00:15:39 - dollar sign file.txt, it's going to find it because
00:15:43 - it ignores all that stuff. So fgrep is useful if you're doing a situation
00:15:47 - that it might be interpreted as regular expressions and
00:15:50 - you don't want it to. Alright, so that rounds up the
00:15:54 - grep, egrep, and fgrep. So grep is the regular tool that you use. Egrep
00:15:59 - is extended grep, which provides a lot more functionality like the or
00:16:02 - symbol and all these regular expressions that aren't supported
00:16:05 - regularly. And then fgrep is kind of the opposite. It's fast grep.
00:16:09 - It just searches for this literal string that you type in on
00:16:12 - the line right there. And those are the three greps. So really
00:16:16 - quickly here, remember grep is the one that uses some
00:16:22 - regular expressions. Fgrep
00:16:24 - only interprets literal strings that you supply to it.
00:16:32 - And egrep is extended
00:16:36 - regex, or regular expressions. So egrep is the one that you
00:16:40 - can do all that fancy stuff in. And again, egrep does the same thing
00:16:43 - that grep does. So if you aren't sure if grep supports the
00:16:47 - regular expressions that you're trying to use, you can use
00:16:49 - egrep. That's not going to harm you if you're using regular expressions.
00:16:52 - This just supports more than grep does out of the box. All right, so those
00:16:56 - are the three greps. Just try to remember what the difference is
00:17:00 - between them. It's easy for me to remember F as in fast
00:17:03 - and E as in extended, and then grep is just regular grep.
00:17:08 - All right. So while grep is normally used or is always used to search
00:17:13 - through files for specific things using regular expressions
00:17:16 - or not, sed or stream editor is the tool that you use for editing
00:17:21 - things on the command line. Now this seemingly simple tool can
00:17:24 - be very, very powerful. We're just going to introduce it here so you can do
00:17:28 - some simple manipulation with it. And basically you type
00:17:31 - sed, and then -e is to tell it you're going to edit
00:17:35 - something. And then in single quotes, you give it the command
00:17:39 - that you're going to do or your action. It starts with the action.
00:17:42 - We're going to substitute, so s means substitute, then
00:17:46 - a forward slash, and then you pick what you're going to substitute.
00:17:50 - So let's say we want to search for all occurrences of o-o. Now
00:17:54 - this can be regular expressions, too. So we'll start
00:17:58 - with simple text. So we're going to search for o-o, and another forward slash,
00:18:02 - and we're going to replace it with zero zero, okay. And then
00:18:08 - we close that
00:18:10 - and the single quotation to close it. And then what file are we going to
00:18:16 - work on. Well, we're going to work on our trusty file.txt,
00:18:20 - all right. So what this is going to do, if we hit Enter it actually
00:18:23 - puts it onto the screen here. It doesn't actually change the file.
00:18:26 - If we wanted to create a file, remember redirection. We would
00:18:29 - have added
00:18:31 - the greater than newfile.txt and it would put the results in
00:18:35 - a file. But I actually wanted to show you what it does here.
00:18:38 - So it's taken all of our two Os and changed it to two zeros,
00:18:43 - see, just like we told it to do in the file, which is really,
00:18:46 - really cool. That's exactly what it did. So it took -- we
00:18:50 - substituted o-o for zero zero. If you want to do anything a little
00:18:55 - more creative with sed that you want to use extended regular
00:19:00 - expressions like we did with egrep, well there's
00:19:03 - a command for that, too. Basically, you need to type sed minus
00:19:07 - r for regular expressions and edit, and then you can start
00:19:11 - doing all kinds of crazy things. Like we want to substitute
00:19:16 - anything that
00:19:20 - begins with either B
00:19:24 - or b.
00:19:27 - So it begins with either B or b.
00:19:31 - We want to substitute that with
00:19:35 - capital C, in file.txt. So now what is this going to do? Well what
00:19:41 - it did, it took all of our words that started with either
00:19:45 - B or b, again upper case or lower case B, and replaced it with
00:19:49 - an upper case C. And sure enough, we have clades, Cooks, cooks and cook.
00:19:54 - So you can use regular expressions, even complicated ones,
00:19:57 - right inside sed, the stream editor. But you have to remember
00:20:01 - to use that R flag if you're going to use any complicated
00:20:05 - regular expressions. A few simple ones, just like with grep,
00:20:08 - a few simple ones you can use without that flag. But regular
00:20:11 - expressions, extended regular expressions, you have to use that
00:20:14 - R flag along with the E flag, alright. So that's how you can actually
00:20:17 - change the file. Rather than just finding it, you can make changes
00:20:21 - by using this substitute command in using sed as your stream
00:20:26 - editor. All right, so we've gone over grep; fgrep; egrep; sed, the
00:20:31 - stream editor; and now you're ready to go over the regular expressions,
00:20:35 - the big scary part that I told you about. Well it turns out, pretty much
00:20:39 - already did. I tricked you into it there while we were going over
00:20:42 - the other things. We covered what regular expressions are. I
00:20:45 - mean, this is quite a set of regular expressions right here.
00:20:49 - And you know what's going on here. We're substituting this
00:20:54 - regular expression for this regular expression, and this actually
00:20:58 - just a, you know, we're substituting it for that and close it. So you've done,
00:21:02 - you've done regular expressions throughout this entire nugget.
00:21:05 - So the only thing that I might say is man regex, just
00:21:10 - to give you some more examples of some things that you
00:21:15 - can do. And it gets a lot more complicated than you'll need
00:21:17 - for this nugget, but it's one of those things if you can learn
00:21:21 - regular expressions, you will just be a guru that everybody will
00:21:24 - ask you to write these really complicated forums -- or not forums, formulas --
00:21:29 - for them to do because not many people really understand
00:21:32 - regular expressions that well. But after this nugget, you're
00:21:36 - one of the few that at least understand it somewhat.
00:21:39 - So you've done it. You learned all about grep; the fast
00:21:43 - grep that just does literal searches; egrep for extended regular
00:21:48 - expressions searching; and sed, the stream editor, that
00:21:52 - is a lot more powerful than it seems by the little command line
00:21:56 - editing that we just did. But sed, the stream editor, is amazingly
00:22:00 - powerful when you're doing scripting. And throughout all of that,
00:22:03 - you learned the big and powerful regular expressions, at least quite
00:22:08 - an introduction to it. Again, I don't think anybody can ever
00:22:12 - know everything there is to know about regular expressions,
00:22:15 - but it's a great tool and you should learn as much as you can
00:22:18 - about it. I hope that this has been informative for you, and I'd
00:22:22 - like to thank you for viewing..