00:00:01 - Welcome to section 103.2. In this nugget we're going
00:00:04 - to talk about processing text streams using filters. Now basically
00:00:08 - what this nugget covers is the textutils package, this
00:00:12 - includes a whole bunch of text manipulating programs and
00:00:17 - it's going to be this long laundry list of things that we're
00:00:20 - going to go over. This is going to be a nugget that if you're
00:00:23 - a note taker, you're just going to want to write down what these
00:00:25 - things do, maybe play it back so you can see them. Day to day
00:00:29 - activities you may not use very many of these, however you could
00:00:32 - be tested on any and or all of them in the test
00:00:37 - itself. So we're going to go through them each one by one, these
00:00:39 - are the different commands that are mentioned in the
00:00:41 - LPIC exam requirements, so we're going to go show you what each one
00:00:46 - of them does, try to figure where you might use them and
00:00:50 - hopefully you can commit them to memory. So let's begin.
00:00:54 - Okay, on the computer here, I've set up a few different files.
00:00:58 - I have hello and hello2, both.txt. Now those are the files
00:01:02 - that we're going to use and we're going to manipulate them
00:01:04 - with these different tools that are part of the textutils package.
00:01:08 - So we're going to start with the very first one and what we're
00:01:11 - going to do first is look at the man page for the command itself.
00:01:15 - So our first one is cat, so let's look at the man page for
00:01:18 - cat, cat stands for Concatenate. And what you basically do is it
00:01:23 - will dump all of the files that are listed into one
00:01:28 - file or one output. There's some other things you can do, you
00:01:32 - can number the output lines, you can do a whole bunch
00:01:35 - of things to them, but basically what this does is, here, we'll do
00:01:40 - cat hello.txt. And basically it just prints this
00:01:45 - to the screen, so it concatenates them together. If we were to do cat
00:01:50 - hello.txt and hello2.txt, it will print them
00:01:55 - both. And you'll see I have this file, which we looked
00:01:58 - at, this is hello.txt, it has these lines on it. And then
00:02:02 - this is the file hello2.txt, it just has this line, it's
00:02:06 - separated by tabs, we're going to use that later in one of our
00:02:10 - commands, these are the two files and catting does that.
00:02:13 - Now what you'll generally do is use redirect tools, like say we wanted
00:02:16 - to make a file that contained all of those, we will cat hello1
00:02:21 - hello2 and then we'll redirect that into hello3.txt.
00:02:27 - Now nothing shows on the screen, but that's because
00:02:30 - if we do an ls command, we'll see it created the file, basically
00:02:33 - it redirected that input into a file. And if we look at hello3,
00:02:37 - we'll see that it became all of those two things put
00:02:43 - together. So we concatenate those two files into one
00:02:47 - called hello3.txt. Now we're not really going to use
00:02:50 - hello3.txt for the rest of the command, so we'll get
00:02:54 - rid of that. See, so we just have hello and hello2. Let's clear
00:02:58 - the screen and we'll move on to our next command. So cat's
00:03:02 - one that you're going to use probably very often, because that's an easy
00:03:05 - way to look at like a text file just to see what's inside of
00:03:07 - it, so that's cat. Next on our list, okay, we have the cut command.
00:03:14 - Now let's look at cut and see what it does, look at the man page
00:03:17 - for cut. Now cut will remove sections from each line
00:03:21 - of the text files that you input. And then you can tell it how
00:03:25 - it's going to select that information. It can be with via
00:03:30 - bytes or characters or a text or delimiter, like if you have a, oh,
00:03:34 - like a spreadsheet type thing, separated by commas or tabs fields.
00:03:39 - What it'll do is you can just display fields based on that delimeter, okay.
00:03:43 - We are going to give you an example, let's look at what our
00:03:48 - file looks like. So we're going do hello.txt, all right, that's our file,
00:03:52 - so we know what it is we're working with. So let's use
00:03:55 - the cut command. We're going to cut based on character position,
00:03:59 - okay. So let's cut characters two, three, four and five from hello.txt.
00:04:06 - Now if we just wanted to put the results in a file, again
00:04:10 - we would do that greater than sign and then the file
00:04:13 - .txt, but I actually want to look at the results, so we are
00:04:17 - not going to give it a file to pipe it into, we're just going to
00:04:21 - display it on the screen. So let's look and see what it gives
00:04:23 - us. Okay let's see if this makes sense. Now cut should have
00:04:27 - given us characters two, three, four and five from every line
00:04:31 - of hello.txt. Let's look at the first line, This is a test, okay. Characters
00:04:36 - two, three, four and five, so we should have h-i-s and a space,
00:04:42 - h-i-s and we can assume there's a space there. That's correct.
00:04:45 - Now on the next line, again it does line by line, we should
00:04:48 - get character two, three, four and five, so that makes sense, we
00:04:53 - should have a space and then b-i-g, so space b-i-g, perfect.
00:04:58 - And now I have this blank line. Well that makes sense, because
00:05:02 - there's only one character in this line, character number one,
00:05:05 - right. We don't want that, we're only looking for characters two,
00:05:08 - three, four and five, which in this case, happen to be, there are
00:05:12 - none, so it's going to give us a blank line. So cut has done
00:05:15 - exactly what we expect it to do, it's given us the characters
00:05:19 - that we told it to cut out and displayed it on the screen for us.
00:05:22 - Okay, so that's how cut works, you can use it in scripts or you
00:05:26 - can use that however you want to pull specific text out of
00:05:29 - text files, all right. So that's cut. Okay, next up on our list we
00:05:33 - have expand. So let's look at the man page for it or the
00:05:37 - manual page. And what this does, it simply converts tabs to spaces.
00:05:43 - So it will convert tabs to spaces. Now not just one space, but
00:05:47 - let, let's look so you can see how this works. Now if we do, if we
00:05:51 - do a cat of
00:05:54 - hello2.txt, you'll see that this is separated by
00:05:58 - tabs. And if you watch we can prove that, okay. We go character
00:06:01 - by character, it'll highlight. And we get to this, it does that
00:06:04 - whole space there, because that's a tab character,
00:06:08 - right. Same thing over here, this is a tab character. Now here
00:06:13 - there's just a space between there, not a tab, but these other ones
00:06:16 - are tabs, okay. So I made this file specifically for this kind
00:06:20 - of thing so you can see that these are tabs, all right. So now
00:06:23 - what we're going to do is run the expand command,
00:06:27 - expand hello2.txt. And again we're not going to pipe
00:06:32 - this into a file, we're not going to redirect it into a file, we're just
00:06:35 - going to display on the screen. Now it doesn't look any different, does
00:06:39 - it? It looks exactly the same, but if we run that same little test,
00:06:43 - remember this, just shows one character, a tab character.
00:06:47 - If we come to this line, all of these are single characters, but it's
00:06:50 - converted that tab to one, two, three spaces. So every tab has
00:06:56 - been converted to three spaces, you see the difference there?
00:07:00 - Instead of being one character, now there's three characters
00:07:03 - here, all right. Same thing over here, it's done the same thing,
00:07:06 - one, two, three,
00:07:09 - one, two, three. And what it did, this one has four spaces.
00:07:16 - This one, three, four spaces, but this one only has one, two, three.
00:07:21 - Well that's because it's lining them up just like tab was, okay?
00:07:28 - So it's converted the tabs into spaces to make sure they line
00:07:32 - up exactly the same.
00:07:34 - All right, does that make sense? That's what expand does. And if
00:07:37 - we look back at the man page, there are some other things that
00:07:41 - we can do.
00:07:43 - We can have the characters be this many characters apart, instead
00:07:48 - of the default, we can have tabs be a specific number. We can
00:07:54 - not convert tabs after non-blanks. So for example,
00:07:58 - if you want your text file not to convert tabs that are after
00:08:02 - characters, that are not blanks, like the initial one, then it won't do
00:08:06 - that. You can use comma separated lists of explicit
00:08:10 - tab positions. You can, for example, you can tell it where to put
00:08:14 - tabs instead of grabbing them from the file itself. And
00:08:18 - there's a couple of other things you can do. But basically
00:08:20 - what you need to know about expand is it will convert a file
00:08:23 - full of tabs into a file that just has spaces where the tabs
00:08:27 - would be. So that is how expand works.
00:08:31 - And we'll move on. Okay, the next command we're going to look at
00:08:34 - it is the command called, let's look at the manual page for it, f-m-t,
00:08:37 - or format, right. What it's going to do is format text for
00:08:41 - you. Now this is one that might not seem terribly useful,
00:08:45 - but an example of this is if you've ever tried to,
00:08:49 - somebody's emailed you something, you try to copy and paste that
00:08:51 - into a document and all of a sudden, there's new line characters
00:08:55 - in really weird places. And you have to go through
00:08:57 - and eliminate that new line character all over the place
00:09:01 - or maybe you've never had that happen, but boy, I sure have. And
00:09:04 - f-m-t command is one that can help us out. So let's look
00:09:07 - again at hello.txt,
00:09:09 - all right. There's our file. Now if we run just with no
00:09:14 - flags, f-m-t on hello.txt, it's going to put it all in
00:09:19 - one line, all right. See it's going to strip out all those extra
00:09:23 - characters that we have. Now that may just seem like, we'll let's put everything
00:09:27 - in one line, but it's actually done a little bit more than that. It's
00:09:30 - followed some parameters. One of the things we can set
00:09:33 - is, what if we said that we had to have a width of only six,
00:09:39 - five I guess, because I hit the wrong number, a width of five characters, that's, we want to
00:09:43 - format it so that it's only five characters wide for the entire
00:09:46 - document, right. We'll do hello.txt, and what do you expect
00:09:51 - is going to happen? It's going to take that same input, those
00:09:53 - three lines and it's going to format it into a document that's only
00:09:57 - five characters wide.
00:09:59 - Well that's what it did, it took all of those characters and
00:10:02 - it made sure that it didn't go any more than five characters
00:10:06 - wide. Now we can do that again with something a little bit more
00:10:10 - generous. Let's do 10 characters wide. Now see it's put a
00:10:14 - couple words on a couple things there, because this is the way
00:10:16 - it's made this paragraph so that it's no longer than 10 characters,
00:10:21 - all right. Ten characters wide, it's pretty slick. It's a great
00:10:25 - way that you can format or fix poorly formatted blocks
00:10:30 - of text and, you know, put them in a way that's more useful for
00:10:33 - you. Again for me it's that copying, pasting emails where a whole
00:10:36 - bunch of random line breaks were put in and it's just a pain
00:10:40 - in the tuck, in the tokus. So, anyway, that's that f-m-t
00:10:44 - command and it is kind of useful at times. All right,
00:10:48 - next we're going to look in a file or at a command called head,
00:10:53 - h-e-a-d. So let's look at the manual page for it. Now what this
00:10:57 - does, it just outputs the first part of files, okay. By default
00:11:02 - it'll do the first 10 lines of any file that you give it, it'll
00:11:05 - display it on the screen. Now you can change that number to,
00:11:09 - you know, and -n, however many lines you want, all right. So
00:11:12 - let's play with this a little bit and see where we have. Now our
00:11:14 - files are pretty short. So let's say we want to look at
00:11:19 - the first 10 lines of var/log/syslog, all right. We're going
00:11:23 - to look at the first 10 lines. And what it's going to do is show
00:11:26 - us the first 10 lines of our log file, the syslog file,
00:11:30 - okay. So what we can see the beginning what happened, at the very
00:11:33 - beginning of that log file. Well let's clear the
00:11:36 - screen and we'll look back at our file. So you can see that's
00:11:39 - one of the ways you can look at the beginning of a really long
00:11:41 - file, it'll just give you those few lines. So let's look
00:11:44 - one more time at hello.txt, okay, its a file that we are getting
00:11:48 - very familiar with. Let's say we wanted to look at the first
00:11:52 - two lines of hello.txt. This should just give us the
00:11:57 - first two lines, because we told it number of lines is two
00:12:00 - and it did. See it left off that third line, it just showed us the beginning
00:12:03 - of it. Now I had to specify minus two, otherwise it would show us the
00:12:07 - first 10 lines and this doesn't even have 10 lines. So it
00:12:10 - would just show us the whole file, see. We do hello.txt,
00:12:15 - it's just going to show us the whole file. And that's
00:12:17 - not what we were looking for. So anyway, that is the head command.
00:12:21 - And we're going to look at its little brother, the tail command,
00:12:25 - a little bit later, but that's how the head command work.
00:12:27 - Okay, this next command, I'll admit, is a little bit confusing, its od,
00:12:31 - okay. So man od, it stands for octal dump. And what
00:12:36 - this does is it shows you a representation of the contents
00:12:40 - of a file. Now a time you might want to use this is if you're
00:12:43 - trying to look at a binary file that has non-printable characters
00:12:47 - in it, because non-printable means we won't be able to see them. But
00:12:51 - let's see how it works on our file here, okay. We'll
00:12:54 - say, od hello.txt, and what it should do is give us an,
00:13:00 - it's going to default to an octal or base eight representation
00:13:04 - of the characters inside of this. So it's not terribly helpful
00:13:08 - for us right now as far as seeing what it does, but
00:13:12 - what it shows us is, again in octal representation, of what characters
00:13:17 - are inside there. Now we could do this, we could make it instead
00:13:20 - of doing an octal form, we could have it do characters, like
00:13:25 - escape characters and such like. So we'll do that on hello.txt,
00:13:29 - the c flag tells it to use character representation.
00:13:32 - And you'll see most of these are characters, all right. So
00:13:35 - it shows us, again the position in the file, but this is a test
00:13:40 - and then you'll see that \n, again that's the new
00:13:43 - line, so that's where our new line character, our new line comes
00:13:46 - in. That should make sense, because you remember that
00:13:49 - then our next line says a space big test, a new line, dot
00:13:55 - on that blank line all on its own and then a new line where I hit enter on that.
00:13:59 - So this is showing us the contents of the file in a way that
00:14:03 - you can see non-printable characters, because n, or \n,
00:14:07 - is telling us it's a new line, but that's not something
00:14:09 - that we can actually see in a printed out thing. So
00:14:12 - od is called octal dump. And it will do things in a
00:14:15 - couple different formats, but it will show us the contents of
00:14:18 - a file in the way that we can see it on the screen. So that's
00:14:22 - what od does, and that's what it's for. Okay, now we have
00:14:26 - the join command. And let's look at join here.
00:14:31 - I have taken this and I've created a couple files for us, because
00:14:35 - what join does is it'll merge two files based on a common field as
00:14:39 - if it were a database or spreadsheet, okay. So we can base that
00:14:44 - on a specific field, it defaults to the first field and will try to
00:14:47 - match them. And that's actually what the text files that I
00:14:50 - have set up does, but you can actually tell it which field
00:14:54 - you wanted to join on. We're going to stick with field one
00:14:56 - and see here are the files I created. So let's look at the
00:15:01 - first text file and the second text file. Now you'll see that there
00:15:06 - is a number here in the first field and a color here in the
00:15:09 - second field, this is just separated by a space, by the way. And then
00:15:12 - down here in number two, I have different articles of clothing.
00:15:18 - So what we're going to do, let's run the join command, join one
00:15:22 - and join two. And what we should end up with is a merging of
00:15:26 - those two files based on common values in that
00:15:31 - specific field. So what we end up with, again this is the field,
00:15:35 - field one, and if it finds a common field or a common value,
00:15:40 - like 10, it's going to merge them. Now you'll notice that the
00:15:43 - 10 only appears once, it doesn't put the common field value
00:15:47 - in there more than once, it's just the rest of it that it actually
00:15:50 - does. So we have 10 blue socks, 20 red shirts,
00:15:53 - 30 purple pants, 40 orange hats and 50 black gloves, because all of
00:15:57 - those values lined up in these fields. If one
00:16:01 - of these didn't match, it just wouldn't include that line. So we would,
00:16:04 - like say it was, instead of 30 pants, we had 25 pants.
00:16:07 - Well, if that was the case, we would just end up with that line
00:16:11 - missing. So we'd go 10 blue socks, 20 red shirts, 40 orange
00:16:15 - hats and 50 black gloves and that's all we would end up
00:16:17 - with. But that's how the join command works, by joining
00:16:22 - two text files based on a common value in a specified field.
00:16:27 - Okay, next on our list is the nl command. So let's
00:16:31 - look at the manual page for it. Now this is going to number
00:16:36 - the lines of a file. So a lot of times you'll want to know,
00:16:39 - let's say you want to print something out with lines on
00:16:42 - it. So you can say well, in line number 876
00:16:45 - of your code, it, it got messed up or something like that. Well
00:16:48 - this is a way that you can export a file and you can
00:16:53 - number the lines. Now you can do a bunch of things like change
00:16:56 - how it's numbered, change what, you know, sections are,
00:16:59 - are delimited. Like let's say you wanted to use a specific character
00:17:04 - here for separating pages. Well basically it will allow
00:17:09 - you to do that, but on a very simple way. Let's do number the lines
00:17:13 - on hello.txt. So see what it did here? It put on our line
00:17:20 - numbers, right at the beginning, also added some spaces for formatting, but
00:17:24 - this is a test, a big test with a dot. And it put those line
00:17:26 - numbers. It'll do that same thing, I mean we still have those files
00:17:30 - I made, so let's look at line numbering this file with all
00:17:34 - these shirts and socks and stuff. See it just puts one, two, three,
00:17:38 - four, five, it numbers the lines in a file for you. So it
00:17:42 - counts those lines up and then actually adds the number.
00:17:45 - So that it's easier to view, easier to sort out where you're
00:17:48 - looking or what you're looking at in a specific file. So that's
00:17:51 - nl. Pretty simple, but that's how it works. Okay, let's clear
00:17:55 - this and next we are going to look at paste. Now what paste does
00:18:01 - is it merges lines in a file. So this is an actual difficult
00:18:06 - thing to do if you're in a text editor, to try to paste lines
00:18:10 - right next to each other. Let me show you exactly what we're
00:18:13 - talking about, and you'll see what I mean how it's kind of difficult
00:18:15 - to do and this is a powerful tool. So we have our two files,
00:18:19 - 1.txt and 2.txt. Let's look at them again just so I know
00:18:23 - we're talking about.
00:18:25 - Okay, so we have these two things. Now what would really be difficult
00:18:28 - is to have line one go over here and line two. Like literally
00:18:33 - if we were to cut this out with a pair of scissors and paste
00:18:36 - that block right here. Well that's kind of hard to do with
00:18:39 - a text editor, but that's where the paste program comes in,
00:18:43 - all right? So if we were to paste
00:18:46 - 1.txt and 2.txt together, it's going to create
00:18:50 - an output that lines them up and separates them by tab characters
00:18:54 - here. See so we have 10 blue, 10 socks, 20 red, 20 shirts,
00:18:59 - 30 purple, 30 pants. What it's done is it's like taken
00:19:02 - this second file and lined it up afterward, after that,
00:19:08 - so it's kind of like pasted it, which is the name of the file,
00:19:11 - but it's pasted it right at the end of that, lining up line
00:19:14 - for line. It's a really neat way to reformat text and like I said, kind
00:19:18 - of difficult to do inside of a text editor, because you're actually
00:19:22 - putting every line after another line in a different file.
00:19:25 - So anyway, that's how paste works. It's just
00:19:28 - a neat little program and it does some things that are
00:19:31 - kind of rough to do, but it does it really quickly. So let's move on, that's paste.
00:19:36 - Next we have a really simple program, and it's called pr,
00:19:40 - all right. So let's look and see what it does, man pr, it converts a text file,
00:19:45 - or text files, to printing, for printing. All right, so it's,
00:19:49 - this output then is ideal for printing. Doesn't sound like a big deal,
00:19:53 - because all text should print well. Well, that's not entirely true,
00:19:56 - if it's a really, really long line of text, sometimes it won't
00:19:58 - print on one page, but if you run the pr command. Let's run
00:20:02 - it on hello.txt.
00:20:05 - Oh no, nothing. Well actually there is something, but it's printed
00:20:08 - the entire page, so we have to scroll up a little bit to see. And this
00:20:12 - is what it's done, it makes a nice header on top, the date and time, the file
00:20:17 - name, page one and then it puts it all on the page there. And
00:20:21 - you can do this for a couple files. Let's do
00:20:25 - prhello.txt and hello2, oop, yep, 2.txt, and we're still going to have
00:20:32 - to scroll up, but see what it's done is it has done both of these,
00:20:38 - see. This is our first one, it's created a page for each file. Now
00:20:42 - again we didn't concatenate them together, so I've created
00:20:45 - a new page for each file. But there's our first one and scroll
00:20:49 - down, here is hello2.txt. And its created this printable
00:20:53 - page, again, it's, it's not an amazing program necessarily, but
00:20:57 - it will do things in pages. Let's do something a little
00:21:01 - bit bigger, pr, we'll go back to that var/log/syslog. Let's say we wanted
00:21:07 - to print that out and have it be, you know, page by page by page,
00:21:10 - it'd be very readable. Well if we do that, it's going to create much
00:21:14 - more output, but let's scroll up and see what kind of thing
00:21:16 - we have. All right,
00:21:19 - see at the top of every page it's going to do this, it's going
00:21:22 - to make it fit into one page. You'll see we have page six and
00:21:26 - up here we have page five. So it'll break it up into pages so that it
00:21:32 - doesn't just run off the end of every page. So it is a useful
00:21:35 - tool, but not terribly complex in how it works or what
00:21:38 - it does. But that's pr, another text manipulation program.
00:21:44 - Okay, next we're going to look at are very, very, very powerful
00:21:47 - tool. It's called s-e-d, or sed, and this is the stream
00:21:52 - editor. Now what this does, it allows you to basically edit
00:21:57 - based on a string that you provided on an entire file. Now
00:22:01 - this is really, really amazing. You can use regular expressions
00:22:05 - and all kinds of stuff. I'm going to do a very simple
00:22:09 - example of how powerful sed can be. So let's look at
00:22:14 - the 2.txt file that we created, right. So we see
00:22:17 - that's what's there. Now if we were to run sed
00:22:21 - -e, for edit, and then in single quotes, we're going
00:22:25 - to put the command. We're going to substitute and then what
00:22:29 - we're going to substitute is pants for dress, okay,
00:22:37 - or dresses. We want to keep the plurality the same there, okay. And we're going to
00:22:40 - close that and we're going to put that in single quotes. So basically what
00:22:44 - this is saying is, we're going to use a stream editor, we're
00:22:46 - going to edit and then we put our editing string in here. We're going
00:22:50 - to substitute pants for dresses in the file 2.txt. So
00:22:56 - what this does, every occurrence of pants is going to be
00:23:00 - substituted with dresses and see that's what it did,
00:23:03 - all right? So well now what could we do, now there's just one occurrence in
00:23:07 - there, but let's take this same file. And instead of substituting
00:23:12 - pants for dresses, let's substitute zero for, I don't know,
00:23:19 - chicken. Now if we do that, what would expect is going
00:23:23 - to happen? Well it's going to do this, 1chicken socks, 2chicken shirts.
00:23:27 - It substituted every occurrence of zero with chicken.
00:23:31 - All right, so this stream editor is amazingly powerful, because you
00:23:35 - can, oh gosh, you can do so many things with it. We've
00:23:40 - talked about redirecting a little bit, I'm going to give you a little
00:23:43 - bit more complex example, all right. So we know what the
00:23:48 - output from that's going to be, what if we then piped the output
00:23:52 - back into sed -e, and then with that we're going to
00:23:59 - substitute every occurrence of c-k-e-n
00:24:05 - with m-p,
00:24:11 - all right. So now let me walk you through what's going on
00:24:14 - here, all right. We're going to type the same thing up here, so we're going to
00:24:18 - take all of these and we're going to replace every zero with chicken.
00:24:22 - And then we're going to take this output, 1chicken socks,
00:24:25 - 2chicken shirts, we're going to put that through this sed command
00:24:29 - and replace c-k-e-n with m-p, all right. Now what should
00:24:33 - the output here be?
00:24:35 - Now what we've ended up with is 1chimp socks, 2chimp shirts, because
00:24:39 - see we've taken off that c-k-e-n and we've replaced it
00:24:42 - with m-p.
00:24:44 - So it's a really, really, really, really powerful tool for doing
00:24:49 - stuff like that. And it's easy to get into trouble though, because
00:24:53 - what, let's say we replaced the letter c with something. Well, it would replace
00:24:57 - this c and this c and this c as well and that's not what we
00:25:01 - want to do. So it's a powerful tool, it's something you have to be careful
00:25:04 - with, but sed is something that's used a lot in scripting programs
00:25:08 - and that kind of thing, to edit files using, you know, search and
00:25:12 - replace type thing or regular expressions, which we'll talk about later.
00:25:15 - So sed, it's a very, very important command and a really neat
00:25:19 - tool to learn to understand. Okay, so that sed, let's talk
00:25:23 - about sorting now. Let's clear the screen and we're going to
00:25:26 - look at the sort command. Now sort does just what the name
00:25:30 - would imply, it's sorts the lines in a text file. So it's going
00:25:34 - to take every line and sort it. Now by default, it's going
00:25:38 - to sort it alphabetically, but we can change that. We can
00:25:42 - change it based, you know, if there's months in there, we can change
00:25:44 - it based on months or reverse, you know. We can do all sorts of different
00:25:49 - things, we can merge already sorted files and not sort them, all
00:25:53 - kinds of stuff. But let's just do a few examples here. Well
00:25:56 - if we type sort, oh, first of all, let's look at what we have. I
00:26:00 - know we've looked at this file a lot, but just so we know. This
00:26:03 - is the file, this is a test, a big test, and period, okay. So what if we
00:26:09 - sorted this?
00:26:14 - Well, it's going to sort it, the punctuation, alphabetically, according to the
00:26:18 - computer comes before letters a, and then t, okay. So it's sorted
00:26:23 - that, actually ends up being completely backwards for us,
00:26:25 - but that's just because that's the way that it was. It's sorting it
00:26:28 - alphabetically. Let's look at sort 1.txt.
00:26:34 - Well that looks exactly like,
00:26:37 - if we just look at it, because we've already put that in order,
00:26:41 - see, one, two, three, four, five, one, two, three, four, five, it's already in order. Well let's
00:26:45 - see some of the other things that sort can do. Let's clear the screen.
00:26:48 - And what if we do sort, if you looked, - lower case r,
00:26:52 - is going to sort it in reverse. So let's do that and we should
00:26:57 - see it sorted it backwards, started with 50, went down to 10, all right. Pretty
00:27:02 - neat. Now you have to be careful, remember we've talked about
00:27:05 - case being specific, because if you do sort
00:27:09 - capital R 1.d-a,.txt, this does a random
00:27:15 - sort. So if we do the same thing twice, we're not going to get
00:27:18 - the same answers. Se it randomly puts them in order or actually
00:27:21 - randomly takes them out of any sort of semblance of order. But
00:27:25 - it just randomize the lines, which is a neat way to mix stuff up, but
00:27:29 - my point here was I didn't, I want you to make sure you know that
00:27:32 - upper case R and lower case r, are drastically different things,
00:27:36 - okay. But sort is a way that you can sort the different lines
00:27:40 - in a text file to a way that seems most pleasing or seems
00:27:44 - to make the most sense for you, all right. So that's all there is
00:27:46 - to the sort command. Okay, as we go alphabetically down our list
00:27:50 - of commands, next is the split command. Now what this does,
00:27:56 - it does what again you expect, it splits the file into
00:27:59 - pieces, all right. Now this splits it into, it's not going to do as much good
00:28:03 - to split it into pieces and just show us on the screen, right.
00:28:06 - It's going to split it into a bunch of different files. Now
00:28:10 - you can tell it how to do, you can break it apart based on number
00:28:13 - of bytes, with the b command. You can do a number of lines
00:28:18 - with the l command and we're going to do that, okay. We're going
00:28:21 - to show, now let's clear the screen, we'll see what we have here. We
00:28:25 - have our trusty files that we've been working with all along.
00:28:28 - So let's split. Now let's do this based on lines at first,
00:28:33 - we're going to split in to files with two lines in each
00:28:36 - file, all right, 1.txt. Now there's some numbers in a row there,
00:28:40 - but don't be confused, it split -l for lines, 2 is how many
00:28:44 - lines we're going to give it, and the name of our file is 1.txt.
00:28:48 - All right, I just wanted to clarify, because those numbers
00:28:50 - are kind of smooshed together. So if we type that, well it doesn't show anything
00:28:54 - on the screen because it's created files. So let's look again.
00:28:58 - It's created these files, xaa, xab, xac. And if we look
00:29:03 - at those, cat xaa, okay, two lines per file, that makes
00:29:07 - sense, right. Cat xab, two lines per file, cat xac,
00:29:13 - now there's only one left, so that one's going to have one, all right. So that's,
00:29:16 - you know, this is, this makes sense if you have something that
00:29:20 - is extremely long, like that syslog file that we created. We could split
00:29:24 - that into a bunch of smaller, more manageable files, all right.
00:29:28 - Let's get rid of all of these, because I want to show you something
00:29:30 - else. So if we looked at, see I've erased all of those files starting
00:29:34 - with x, let's clear the screen again just to get a fresh start. Instead
00:29:38 - of splitting by lines, what if we split by bytes? Split by
00:29:43 - bytes, we'll say, we'll split it into files that each have five bytes of
00:29:48 - information in it. And we're going to do this file, 1.txt again,
00:29:52 - all right. So, now five bytes is not a whole lot, right. So this
00:29:57 - should give us more files, let's see what it does.
00:30:01 - Yes, it's given us a whole bunch more files, we have xaa,
00:30:03 - xab, xac, xad. And those files should all be
00:30:10 - five bytes long. Now you'll notice this is kinda goofy, and why
00:30:15 - this is special, it's given us 10 or five bytes, right, every
00:30:19 - character is a byte. So you have one character, two characters, three characters,
00:30:24 - four characters, five characters and that new line character
00:30:28 - at the end of a line isn't part of this file, okay. It only
00:30:32 - gave us five bytes, which doesn't include the new line that
00:30:35 - would make it go to the next line like that.
00:30:38 - Oh, so that's interesting.
00:30:40 - Let's hit enter a couple times, give us a line. So what if
00:30:43 - we cat xab?
00:30:45 - Well now this one gave us a new line, didn't it? Now what happened here? Well
00:30:51 - we can describe this here, again, up here it was one zero
00:30:55 - space b-l, so the next characters would be ue for blue and
00:31:00 - then a new line character, which brought us down here and then
00:31:05 - two zero, but that's it, that's five bytes, right? We have one byte for the
00:31:08 - u, one byte for the e, one byte for the new line character, which
00:31:13 - brought us here, one byte for the two, one byte for the zero, but then
00:31:16 - there's no new line character, so it didn't, it didn't give
00:31:20 - us this carriage return kind of thing. But that's what split
00:31:23 - does, it splits it up into a bunch of different files, based
00:31:26 - on the format that you tell it, all right. So you can also
00:31:29 - specify what these look like, this is just the default, it does
00:31:32 - x and then the, you know, it increments in two letter
00:31:36 - increments like that. So here, let's
00:31:41 - get rid of these.
00:31:43 - And I'll show you, see if we did something like that to, let's
00:31:47 - say, split
00:31:49 - line numbers, 10 line numbers. So there's going to be 10 lines in
00:31:55 - every file of var/log/syslog. Now this is a huge text
00:32:01 - file, right, so it's going to give, oh, it's not a huge file,
00:32:05 - it must have recently rotated, oh my goodness, okay. Well that kind of made
00:32:09 - a liar out of me, didn't, didn't it?
00:32:11 - Xaa, see there's 10 lines, yeah, I just recently did a, it just recently
00:32:19 - redid it,
00:32:21 - it archived this into syslog.1, syslog.2, so that's not
00:32:24 - a log file, doggone it.
00:32:26 - Well let's look at one that's a little bit,
00:32:31 - let's try dmesg. That one looks a little bit longer, let's
00:32:35 - try that one.
00:32:37 - So split -line 10, var/log/dmesg.
00:32:46 - That's better, see all these files it gave us? It split them
00:32:50 - into a whole bunch of files all 10 lines long, let's look at
00:32:54 - one, cat xba. See it's going to be 10 lines out of that
00:32:59 - dmesg file, all right. So that's how split works, you can break up things
00:33:03 - into more manageable files. And obviously you can make a fool
00:33:07 - of yourself if you assume that var log messages is a very
00:33:10 - long file. But that's how it works and that's what split will do
00:33:14 - for you. Okay, next up we have the tail command, which is just
00:33:19 - like the head command, only this does the end of the file. This
00:33:22 - is one that you tend to use a lot more when looking at,
00:33:25 - if you're looking at, like a log file kind of thing. So let's
00:33:30 - do that actually, let's tail
00:33:33 - var/log/dmesg and it's going to show us the last
00:33:37 - 10 lines of that file, okay. We can change that number of lines,
00:33:42 - we can do tail minus number of lines five var, oop, var/log/dmesg,
00:33:49 - and it's just going to show us the last five lines, all right.
00:33:53 - So that's how tail works. Now there is a really neat thing, tail
00:33:56 - will also, if you clear the screen, tail -f var/log/dmesg,
00:34:03 - okay. What this shows us, it shows us the last 10 lines,
00:34:08 - but it keeps on showing it, waiting for something
00:34:13 - to happen. Now let's see if I can make something happen. I
00:34:16 - will, how about I unplug
00:34:19 - the network.
00:34:21 - Unplug the network,
00:34:24 - disconnected. Okay, it doesn't show up in dmesg, that's going to
00:34:27 - show up in the var/log/sys, so let's close that. Let's do
00:34:32 - tail -f var/log/syslog, all right.
00:34:37 - So here, okay now we're seeing the last 10 lines and it's going
00:34:42 - to, I'm going to plug this back in and we should see it automatically appear,
00:34:48 - there, see, as soon as something happened, it showed up on the
00:34:51 - screen. So that's a really, really awesome tool if you're trying
00:34:54 - to watch your log files for changes, you just type tail
00:34:58 - -f, and it will show you, in real time, as things get added
00:35:03 - to the end of that. So
00:35:06 - again, there's some other things you can do, you can just show the last
00:35:09 - several bytes, as opposed to lines, usually lines is what
00:35:12 - you do. But follow is the command that I just showed you and it just
00:35:17 - follows it as it grows, which is really, really cool. But anyway
00:35:20 - that's the tail command, generally to look at the end of a file
00:35:23 - or in this case you can watch the end of the file as it gets
00:35:26 - written to, all right. So that is tail. Okay, our next command is one of
00:35:31 - those that doesn't seem very powerful, it's t-r, which is short
00:35:35 - for translate. So the tr command does a couple things,
00:35:39 - it will translate from SET1 to SET2 and it's a really
00:35:42 - confusing command. So let me just demonstrate exactly what we're
00:35:45 - talking about. Now I'm going to skip ahead a few nuggets and
00:35:49 - I'm going to use the pipe command. So here, if we do,
00:35:53 - I just want to, this is just a two second intro to piping, but if I
00:35:56 - type echo and then "HELLO", it's going to return HELLO, all in caps.
00:36:00 - Well if I want to use that
00:36:03 - output as the input for another command, I do this pipe symbol and
00:36:08 - that's how I'm going to demonstrate the tr command for you. So HELLO, all in caps,
00:36:13 - we're entering that into the tr command. And I'm going to use the -t
00:36:18 - flag for translate. Now if you don't put a flag at all,
00:36:20 - it defaults to that -t, but just to be perfectly
00:36:24 - clear, I'll do that. And basically it translates all the characters
00:36:29 - in the first set, it just takes two sets of numbers or letters
00:36:32 - or characters. We can do A-B-C-D-E, oop, D-E-F
00:36:38 - G-H-I-J-K-L-M-N-O-P
00:36:43 - Q-R-S-T-U-V-W-X-Y and Z. The sad thing is I
00:36:50 - didn't use caps locks, I have no idea why I didn't, I just
00:36:52 - held the shift key the whole time. Anyway, there is also a shorthand, you
00:36:55 - can do a throughz or like we would have done A through Z
00:37:00 - for what we had just typed, but I wanted to show you how to do it
00:37:03 - all that way. But we'll do the second set a through z. So now what's going to
00:37:06 - happen, it's going to translate, it's going to take all these
00:37:09 - upper case letters and translate it from
00:37:13 - upper case to lower case. This is really convenient in a script
00:37:16 - if you want to lower case or change something to lower case,
00:37:19 - so let's hit enter, and see our output is hello. Now if that's not clear,
00:37:24 - let's do something very similar,
00:37:28 - "HELLO", we're going to pipe it into the tr command, but we're only going
00:37:34 - to translate L,
00:37:36 - and l.
00:37:38 - So all we're doing now is we're piping this, so we're putting
00:37:43 - the all upper case HELLO, we're translating it with a -t
00:37:46 - flag, so that all the upper case L's come out as lower case l's,
00:37:51 - all right. Now we should see,
00:37:54 - yes, so these two l's are now lower case instead of upper case.
00:37:58 - Now tr does a couple other things too, you can translate from
00:38:01 - SET1 to SET2, even if they're a little bit different.
00:38:04 - There's another couple things you can, like it doesn't have
00:38:06 - to just be ABCDEFG to blah, blah, blah, blah, blah, it could be something that's,
00:38:13 - I'll show you what I mean here,
00:38:16 - pipe tr -t, if we replace
00:38:21 - A-B-C-D-E-F-G-H-I-J-K, oop, J-K-L, and we replace it with
00:38:30 - something like that,
00:38:35 - we should end up with gobbeldegook. Yes, see.
00:38:40 - So it's taken, its lined up the h, which was the one, two,
00:38:44 - three, four, five, six, seven, eighth character and replaced it with
00:38:48 - the eighth character here, which happens to be that o. So
00:38:51 - it made it completely, you know, not very helpful, but that's
00:38:54 - just so you know how it works. You can also do things, if you
00:38:57 - remember back in here, there are other commands like delete
00:39:01 - and squeeze, all right. So -d and -s for squeeze,
00:39:06 - are two other ones I want to show you. So let's go back to,
00:39:12 - so now we're going to do tr -d and have it delete
00:39:16 - all of the capital Ls. So what should, I want to make sure I don't end up with
00:39:20 - a swear word here, yes, okay, so this should just return HEO and
00:39:24 - it did. See, it deleted all the L characters. Now the same thing
00:39:28 - if, instead of the delete flag, we do the squeeze flag. It'll squeeze all
00:39:34 - the repeating L characters, I chose L because that's the
00:39:37 - only repeating character we have here, but it will replace all
00:39:40 - of that or it'll squeeze them all together. See so we have HELO.
00:39:44 - Now why would that be practical? Well if you had a file with
00:39:48 - a whole bunch of spaces in it or a whole bunch of new line
00:39:53 - characters and you wanted to squish them all down to one, well
00:39:56 - what you can do is
00:39:58 - do this squeeze all of the, like return characters or squeeze
00:40:02 - all the spaces together into one. And you can do that from within
00:40:05 - a script with a bunch of different text. So it doesn't seem
00:40:08 - like a really useful command right at first, but once you
00:40:12 - see what kind of power it has, it is something that if
00:40:15 - you're a scripter, you'll end up using the tr command or
00:40:18 - translate, all right. Okay, next we're going to look at a ff, a command called
00:40:22 - unexpand, unexpand. Now if you remember earlier on in this nugget, we looked
00:40:27 - at the expand command, which converted tabs into spaces. Well
00:40:31 - this does exactly the opposite, there's a few tricks to it though.
00:40:34 - Now I've created a file called hello3.txt and you can
00:40:40 - see that this is actually lying, I just,
00:40:43 - basically to create this file, I used the expand command from
00:40:46 - earlier. And instead of tabs in between here, it's just spaces.
00:40:51 - See this is separated by spaces
00:40:54 - that we used actually the expand command to do. So what I want to do
00:40:57 - is run unexpand on it, but there is a gotcha. See if we just
00:41:01 - run unexpand hello3.txt, it's not going to
00:41:06 - do what we would expect.
00:41:09 - There are still spaces, because by default, the unexpand command,
00:41:14 - if we look in the man page,
00:41:17 - it only does it
00:41:20 - on the initial blanks, only if there's blanks at the beginning.
00:41:23 - So what we want to do, is do this -a, for doing all
00:41:26 - the blanks, so it will convert the entire file.
00:41:29 - So if we do unexpand -a for all the blanks, hello3.txt,
00:41:35 - again, it looks
00:41:36 - the same, but we can use this trick to see that what
00:41:39 - it's done, yes, it's created tabs in between there, instead of
00:41:43 - spaces. See so these are tabs, now whereas up here, they're individual
00:41:48 - spaces, now they're tabs. So that's how the unexpand command works. Exactly
00:41:53 - the opposite of the expand command, but you have to know that
00:41:56 - if you want it to do it in the entire string, you need to use that -a
00:41:59 - flag, all right. And just like the other one, there's a bunch of
00:42:02 - things you can, you know,
00:42:05 - you can set exactly how long a tab is, the default is eight characters.
00:42:10 - You can do, you know, some more specific things, but just
00:42:13 - know that it converts spaces to tabs. Okay, our next command
00:42:17 - is the uniq command. It's like unix, but with a q.
00:42:22 - And this will show different lines in a file that
00:42:26 - are uniq or duplicated and there's a few commands. By default
00:42:30 - what it does, is if there's duplicate lines, it'll just show
00:42:34 - you one of them, all right. If you do the -c, actually, I'll show
00:42:38 - these to you in a second, but it will show you the number of occurrences
00:42:42 - of each line. It will show, you can, here if we scroll down a little bit more, you can
00:42:48 - show only lines that have uniq things, things that aren't repeated,
00:42:51 - there's a bunch of different things you can do.
00:42:54 - We're going to show you a few here. Now I modified one of
00:42:56 - handy dandy text files, so let's look at 1.txt. All right,
00:43:00 - so what I've done here, 10 blue, 20 red, 20 red, so that's
00:43:03 - a duplicate, 30 purple, 40 orange, 50 black, 50 black, all right.
00:43:07 - So you can see what data set we're working with right here.
00:43:11 - Now if we type uniq 1.txt with no flags,
00:43:17 - it's going to show us, it's going to lump together duplicates
00:43:21 - and just show us the uniq lines. So we should just see 10, 20, 30, 40,
00:43:25 - 50, see those that weren't uniq, that were duplicated,
00:43:28 - it just mashed them together, so it just showed us the uniq file.
00:43:31 - So if there's a bunch of repeats, it won't show us all of that. Now
00:43:36 - uniq -c does a really cool thing.
00:43:40 - That will show us how many occurrences of the line there is.
00:43:45 - So 10 blue occurs one time, 20 red occurs two times and same
00:43:50 - thing with 50 black, see how that showed us what that c,
00:43:53 - the minus thing. Pretty neat, huh? All right, so we can show only the
00:43:59 - real, let me look one more time, so I don't use the wrong command or wrong flag
00:44:03 - here. We can show only those
00:44:07 - that are duplicated, so -d will show only what is
00:44:10 - duplicated oor again, either flag'll work, or --repeated,
00:44:15 - and then the other one is -u will only show the ones
00:44:18 - that don't have repeats. So -d or -u or uniq and
00:44:22 - repeated, and I'll show you what I mean by that. Let's clear the screen.
00:44:26 - Let's look at the file, so again we know the text that we're
00:44:29 - working with, if we do uniq -, oop, -d,
00:44:37 - that will only show us those lines that are repeated. So 20 is
00:44:42 - repeated more than once, it's going to show us 20, 50 is
00:44:45 - repeated more than once, it's going to show us 50. And then
00:44:49 - uniq -u will do exactly the opposite. It's only going
00:44:53 - to show us those lines that are uniq to the file that don't
00:44:56 - exist anywhere else.
00:44:58 - So, see there? It didn't show us 20 or 50, because those weren't
00:45:03 - uniq. Now a way that this could be useful is if you're looking
00:45:05 - through a log file to see if you have the same error over and over
00:45:09 - and over. You could easily go through and I don't think I'll
00:45:13 - find anything, but let's look for files that are only duplicated
00:45:17 - in var/log/syslog. Yeah, there's nothing in there that has
00:45:23 - happened more than once. Let's look up messages. Nope, nothing
00:45:28 - in there either. See this is a good sign, right,
00:45:31 - dmesg, yeah, nothing in there. But
00:45:35 - we know that in our 1.txt file,
00:45:41 - oop, I forgot that d,
00:45:43 - -d 1.txt, that these two things occurred
00:45:47 - more than one time, so it'll show us those. So that's how the uniq command
00:45:50 - works. Again, its spelled u-n-i-q, like unix with a q and it's
00:45:54 - a way that you can search log files for things that repeat.
00:45:59 - Okay, the last tool we're going to look at in the textutils package
00:46:02 - is wc for word count. Now what this does, it shows
00:46:07 - you three different things. It will show you how many lines
00:46:09 - are in a file, how many words are in a file and the total byte
00:46:13 - count, which usually means how many characters, because every
00:46:15 - character is a byte. But that could also mean new lines and tabs
00:46:18 - and things like that. So let's look at it in action. Now you'll see
00:46:21 - we have these files that we've been working with
00:46:24 - this whole nugget.
00:46:26 - And we'll do word count on 1.txt, all right. And it shows us,
00:46:31 - in a not terribly easy to read format, but it shows us the
00:46:35 - line count. So there's seven lines in the file,
00:46:39 - 14 words in the file and 60 characters in the file.
00:46:43 - Now you can just ask it to show one, like we just want to show
00:46:47 - the word count in 1.txt, and it'll say, okay,
00:46:50 - there's 14 words in 1.txt.
00:46:52 - Now why does it show the file name right after there? What if you
00:46:55 - want to do wc star, oop, star, show us all of them. Well this is
00:47:01 - kind of nice, we can compare different files in our group right
00:47:04 - here. So we can see that, it'll even do the total, so 17 total
00:47:09 - lines, 221 characters in all of these. But let's
00:47:12 - look, I want to kind of show you something here, do you remember
00:47:14 - we were working with expand and unexpand? Well you can see hello2
00:47:18 - and hello3 were the two files that we were working
00:47:20 - at or working with. Well this hello2, here let's look at both
00:47:24 - of them, cat hello2.txt, and cat hello3.txt.
00:47:29 - All right, they look exactly the same, but remember we were
00:47:33 - saying that this one has tabs in it and this one just has spaces
00:47:37 - in it. Well the word count command kind of verifies that, okay. This
00:47:42 - one, each one of these tabs is only one byte, it's only one character,
00:47:46 - so it has fewer characters than hello3, which is full of all
00:47:51 - these spaces in there. See that? So wc, or word count, shows
00:47:56 - us again, that hello3 is a bigger file, because it has more
00:48:00 - characters, even though visually it looks the same, one tab has
00:48:03 - the same amount of bytes as one space, but it takes that many spaces
00:48:08 - to take up that room. Okay, so that finishes up this nugget.
00:48:11 - This is describing the textutils package which is installed
00:48:14 - by default in just about any Linux distribution. I just can't
00:48:17 - picture it not being installed, because these are tools that you use
00:48:20 - all the time, some more than others, like cat is one that you use
00:48:23 - just constantly. But anyway, you should now, after watching this
00:48:27 - whole nugget, understand what all of these commands do, be comfortable
00:48:30 - using them and you'll see that they're useful in scripting
00:48:34 - or sometimes just on the command line as a system administrator.
00:48:37 - Now do remember that some of them have flags that are
00:48:41 - like -f, but also minus,
00:48:47 - -- a word, okay. Either way, either flag will work, either
00:48:51 - the dash in the letter or the dash dash and the word. And sometimes
00:48:54 - you need to know one or the other, you could be tested on one
00:48:58 - or the other way of invoking a command flag. All right, so just
00:49:02 - keep that in mind when you go back through the man pages. And I hope
00:49:05 - that this has been informative for you. And I'd like to thank
00:49:07 - you for viewing.