Skip to main content

SQL HTML CSS WORKSHOP BY COURSERA SERTIFICATE ASHIEVEMENT 2020




SQL So after all that long-winded introduction, now it's time to actually start using SQL, but before we do, I want to give you a picture of where we're at. So this is the request-response cycle. We have a browser over here on the side. You are way over here, that's you on this side. You click on a link, the browser goes and makes a network request using HTTP to a web server, and then it maybe pulls a file in and maybe that file is PHP code. The PHP code starts to run, but then has to talk to the database server so it sends SQL across, that's what we're talking about now. And then this SQL server, MySQL software, is really smart about where to find all this stuff, and it sends the data back to PHP and the PHP loops through the data it got from MySQL server and write some more HTML, which goes back as the response. This is the request-response cycle here, and it parses this response, fills up this thing called the document object model. And here we are, sitting here and we see the new screen. And so that's the request-response cycle, but for today, for this lecture, we're really here. As a matter of fact, we're going to learn the language that's flowing back and forth across the network. Now, often these are on three separate servers, one, two, three, but on your laptop and on my laptop, they'll always just be one server. But it doesn't matter, we're still sending commands over. So we are learning in this next couple of lectures, the syntax of the SQL language that goes into the database server, gives it instructions and is used then to return the data from the database server. So we'll zoom in now. We're going to do two basic things. We have this piece of software, the database server and I keep saying how wonderful this software is and it's millions of lines of a highly-tuned computer sciencey code that also has itself some files that it works with. We don't ever look straight at these files, we would never do that. We send SQL commands in, the magic software does its thing and retrieves the data and then gives it back to us. And it turns out, because this is kind of a network, we can send these commands simultaneously even from multiple sources. And so this is like a server. Meaning that in this case, the PHP could be a client, MySQL could be a client. And the role that we're going to be in is we're going to be what's called a database administrator because usually, when you're writing software or running production software, we developers never get to touch the end user database, but for a while, so we can learn SQL, we're going to be the all powerful database administrator. So by now, I hope that you installed something like XAMPP or LAMP because you're going to need it for the PHP parts of this class, because that has a database server all built in, a PHP server built in, all kinds of things and it just saves you a long time. It has a big bunch of files. On Macintosh, we have a thing called MAMP and we have XAMPP on Windows. I'm not going to spend a lot of time on command line, but command line is really very important, especially if you're going to run stuff on servers, maybe with a Linux server. And so, I'll just show you a little bit of command line at the very beginning with the understanding that maybe you're running Linux and you're gonna use the command line the whole time, and I think that's really cool if you do. But most of us use the command line rarely and we tend not to do too much on our servers so we just use the command line to go in on the server and figure things out if something's not working. But when we're doing software development, we tend to use this software called phpMyAdmin. So let me show you some of that. So, here's MAMP. So MAMP has this control panel that can start and stop the MySQL server and Apache server. These are two programs waiting for network requests, and I can open this web start page. Like XAMPP, they all have these little control panels. You go to the starting page. And so this is running on port 8888. If you remember URLs, we're on localhost which is a loopback, so it's a web server that's running on the same server as this browser is. The outside world can't talk to this. Normally, port 80 is for web but this is going to run on port 8888 in case you're doing something because it's kind of a developer environment. So one of the tools is called phpMyAdmin. And so this is a console that database administrators, and if you run stuff in production, lots of people use this exact software. So we're not really playing in a toy environment. There are more sophisticated ways to do this but this is a very common tool that a lot of people use to do real production management of database servers. Okay? And so, we'll come to this in a second. We can type commands here but now, I'm going to show you how you can talk to the database server from the command line. I've got this handout. Let me grab the command line text from the handout and go into or it wants to test it. So, this is what you do in MAMP. It's really just a long path to a file called mysql. And I'm logging in with a user name of root and a password of root. This is not that big a deal because you can't really come in from outside into my database because a localhost is how this works. And so now I'm at the prompt, on a mysql prompt. And so, this mysql is connected to the same MySQL server that's running on port 8888 that here I am in this port. Actually it's on port 8889. The server's on port 8889. And so, I am also on port 8889. It already knew there's port 8889. So, there's commands that you can type here. Like I can type show databases;. So that's really the same information here as it is here. And, this software and the command line are simultaneously talking to the same SQL server. Neither of these are the SQL server, they are just clients that are sending commands back and forth. So watch what can happen here. I want to get rid of this database. Click on databases. Click check. Get rid of that database. Now, by the way, don't get rid of any of these databases that are here. If you didn't put it in, don't take it out. Those are internal databases that MySQL uses for its own purposes. Now I'm going to go back to the command line. I'm trying to show you how both pieces of software are talking to the same database server at the same time. So, last time I typed show databases; you saw People. If I type show databases; again, it's gone. I deleted it from one client and I see the effect of the deletion on the other client. So there can literally be tons of clients hitting the same database server. There is still one database server. And if I came over here to my MAMP console and I stop the servers, this is shutting down the database server. I hope I can get it back up after I shut it down. So, MySQL server is down and the Apache server is down. Now I'm going to say show databases; This gets out. That server has gone away. Let's see if I can bring it back. Start servers. Come back. Come back. SQL came back. You see little green dot. In XAMPP, it looks a little different. You might have trouble when you first get this set up. Get some help from somebody. If it comes up, great. You need a little bit of work, whatever. Now, I think I should be able to just hit up arrow and ask again show databases; because there we go. So, that is just really showing you that there are two ways of simultaneously talking. We're talking from phpMyAdmin or MySQL but both are doing SQL. Even if I do things in here, it's actually sending SQL commands back to the database server. Now, we're going to make a database. So I'm going to use the statement CREATE DATABASE and I've got this little file that's got all these commands here. Now if you're in the command line, you actually have to switch databases. I would do this not in the command line because I did the last time in the command line. I'm going to do this in phpMyAdmin. So I'm going to be here with SQL. Let me hit refresh here. People's gone. And that one I hadn't refreshed. Wait, I've got to create a database. Hang on. This is going to fail because I'm not in a database. CREATE DATABASE People; Is that in my little sheet? Oh, they're right here. CREATE DATABASE People; This default character set is probably a good idea. Default character set means that you can have non-Latin characters so Asian characters, Greek characters, Russian characters, et cetera. So that's a good idea to do in general so I'm going to go back and come over here and create my People database. Create database People with the right character set. And so now I have a database People. And now I'm going write some SQL. I'm writing this SQL in the database People because the table that I'm about to create is going to be in that database. Come back. CREATE TABLE. So, this CREATE TABLE statement, what this is is, remember I talked about a spreadsheet and the columns of the spreadsheet? This is making the columns of the table. But I told you that when we do this for a database, we are far more precise. We're going to make two columns, one is going to be called name, the other's called email. They're both variable length character fields capable of storing characters not just in the Latin character set, and their maximum amount of 128 characters. If you try to put 129 characters in, it blows up. And you say, "Well, that's pretty mean." They say like, "Whoa, no. We told it. We only wanted 128 characters." So we're establishing a contract at this point. And part of the reason for that contract is for the efficiency that's required so it can store precisely in a certain way because it knows it's never going to be asked to store more than 128 characters. If it was going to store a million characters, it would take a different approach to storing it, but we don't have to worry about that. We just say, "Our contract with you, Mr. Database Server, is 128 characters." So I'm going to type this CREATE TABLE Users and I'm creating it in the People database. People database. There we go. So now, I see that if I go into the People database. And I find out, I've got one table called Users and I can go look at it and I can look at the structure of the Users table. And there we go, we have two columns. Every time you see UTF8, what you are knowing is that we can put non-Latin characters in here, which is a good idea in general. So there's our first database with lots of color commentary. We already did the CREATE TABLE , The DESCRIBE is something that you can type in the command line. I'll type that in the command line. So now you see we have the People database to switch to, so all your commands are aimed at the database. You say use People; database. Multiple tables can be in one database and now I can say describe. Now the keywords like describe and select et cetera are upper lower case. I tend in documentation to make all the keywords uppercase and so that is the same as what we saw in phpMyadmin, where it said oh we've got two columns there. varchar variable length character up to 128 characters. Now I didn't create the table here, but that's because the database server has the table created. Okay, so we're done with that. You can actually create all these tables using WYSIWYG GUI like the editor inside of phpMyAdmin. That's possible. Okay So now we have a table and we're going to do some INSERT statements. And so this is our first bit of SQL. And so the INSERT statement actually is INSERT INTO. Now if really super programmers were doing this would be like, oh INTO is extra. There is a goal inside SQL for you to kind of cognitively look at it and go uh, that's what's going on. So INSERT INTO is kind of just pretty there for us to read it. It's rare there's a programming language that adds things to make it more readable. We are going to send this query. So there's a file somewhere and the database has this file. And so let me go to here and grab the INSERT statement. And I go into SQL, the database People and I'm going to say INSERT INTO Users, name, email, values, chuck and away I go. Okay. So now if I take a look I see that I now have one row with a name column and email column, easy enough. Now let's do it again. And it turns out that the way SQL commands work, is you can make more than one of them as long as you put a semicolon at the end. And so this is the next four INSERT statements. And I can type them all in at the same time and hit go. And it ran into the INSERTs because it has semicolon at the end of each one. And so now I have five lines sitting in there. So that's your first SQL command. Pretty straightforward. The next SQL command is the DELETE statement. And like the INSERT we think of DELETE FROM is like the keyword. Users is the name of the table. And we're going to see a lot of the WHERE clause. The way a lot of these SQL commands work is they operate against the whole table unless you limit them. So WHERE is saying don't do everything. Only do it WHERE email equals ted@umich.edu. And so this is like an IF statement in a programming language except that, it's like this whole thing is a loop. The DELETE is like go through that whole table and DELETE FROM Users but only if that happens. So you could think of that as like a loop that's going around but then there's an if statement in the loop. And then this is only this delete is only done to the records where this turns out to be true. So it's like writing a loop and an if statement all in one statement. And this is one of the reasons that people like this. So I go grab the DELETE statement. Now if I go look at the data, one row was affected. So I take a look and Ted's gone. So there's your second SQL statement. You're doing very well learning SQL. You're actually halfway through. So the U of CRUD, Create Read Update and Delete is Update. So here we have the UPDATE keyword, table to update. The SET keyword then a column name and a value with equal signs name equals Charles. And then again if this is an implied loop the UPDATE is updated everywhere I don't think of the word everywhere sitting here. Update all the rows in Users and set the name equals Charles except for those rows where name, email equals csev@umich.edu. So let's take a look at that, copy that and paste it in. Hopefully you're keeping up. [click] SQL. Go and now I take a look at my data and son of a gun ever right there, the name has been updated. And I hope right now you're thinking to yourself wow that's pretty easy because we're three quarters of the way of learning SQL. I'm sure there's a few other things beyond this but we're most of the way. Select. So it's Create Read Update and Delete. But the way we read stuff is by select. So we still call it CRUD for fun. So SELECT is the statement that says read a table and the first parameter is select. Then the columns and star means all the columns. This way if you don't know what the columns are, you kind of forget what the columns are, you could just say select * From is the keyword. Users of the table and you can ask. This is a good example of how you don't put a where clause in is going to get them all, because the select implicitly is a loop. Select all from users, and then if you want just one or two you can apply a WHERE clause to that so you just tack a where clause on to the end. So let's take a look at those two commands. Select * from users; go. So now we get all four tables. I can go and then I can say, select * from users where a name or email equals csev@umich.edu. And then I can type go. Oh I type too fast I type from backwards but that shows you what an error looks like. From. So that works. But you saw what an error is. Here's another thing let me just show you this. So let's take a look at our data really quick. We got four rows, here are other things. I must say this, select * from users where name equals Alex. Now you might think this is an error. And when we run it you're going to see that it's not at all an error. And what it says is you got no rows and it's true. What did you ask for. You said go through all the rows. Show me every single one, where there's a name of Alex and it did, because there was no name Alex. That's not a flaw. You asked for something and you got it. You could check to see how many rows you got back and sometimes we do a select, say did I get a row back, if you didn't get row back you might consider as a programmer that was an error but other than that not, it doesn't have to be in there. Okay we're making good good progress. So the select has a couple of features like you can change the order, so select * from users order by email. That will order the results by email. Воспроизведите видео, начиная с :19:26, и следуйте текстовым инструкциям19:26 Case is now sorting. Sorting is another thing databases do really really well. Hopefully you're just following along. Unlike claws there's another form of the WHERE clause it's like a wildcard where this is the letter E. Some characters before it letter E and then some characters after it. So I can type that one. Okay, so I'm going to type, select * from users where name like percent sign that matches any number of characters %e%. That really is saying show me an e anywhere in it. In this case I could put a semicolon here but since it's only one command it doesn't really need the semicolon. When you really need the semicolon if I'm doing more than one command here. And so this shows me all the people that have each in their name. So it's kind of a wildcard like selection. You can also order these things and put limits on them. And this is a common thing where maybe you're paging and I'll come back to this later when we will find it more useful. But you can read up on that. You can ask for the number of rows with a count. Count as a built in function, so what we're saying is don't give me the actual rows but instead give me the number of rows that would have been returned had I ran the select statement. So I can say, select count * from users. And that's going to tell me basically, how many records are in the Users table. And so it told me four. And so, at this point, we have actually hit all the CRUD. We've learned how to insert, we've learnt how to delete. We've learnt how to update, we've learnt how to select. We learned a couple of different things that we can do in select. At this point, you've kind of learned half of SQL, because that's really what it does. It looks like a great big spreadsheet with a billion things in it and super fast. But the next thing we're going to learn, well when we start moving things between more than one tables and explore the relationships between those tables, is when it actually becomes interesting. So coming up next we're going to talk about some more detail on how we set up these tables and different kinds of fields that we can put into DB.. So now, we're going to talk about different kinds of ways that we can describe the columns that make up our tables. So we're in a table and we're working on columns. Different kinds are text fields, binary data fields, numeric fields, AUTO_INCREMENT fields and other kinds of fields. Воспроизведите видео, начиная с ::25, и следуйте текстовым инструкциям0:25 So the first two columns I made of name and email were of type VARCHAR, which is probably one of the most common fields or character fields. CHAR and VARCHAR are character fields. They are aware of character sets, which meaning they can be Latin character sets or Asian character sets or Russian or Persian or whatever. A lot of what you're doing to the database server is you're giving it a clue that says, look, this might be 512 characters long, but it might also be 4. So come up with a way to store things that will be efficient both for 4-character strings and 512-character strings. If you don't say VAR, if you say CHAR, what you're saying is, this is 20 characters, and pretty much all the time it's mostly going to be 20 characters. Which means if it's only 5, it might not be efficiently stored, but it'd be very efficiently stored for things that are 20 characters. So, if it's somewhere between 5 and 500, you want to use a VARCHAR, and if it's pretty much always 20. Now, the lengths that you give to these things are absolute maximums. It's not like it compromises, like in the previous example. We did 128 and that's it, that's a contract. You're basically saying, please enforce upon me the rule that these are no more than 128 characters. Воспроизведите видео, начиная с :1:39, и следуйте текстовым инструкциям1:39 And so, there are some text fields that, the key to the text fields is that they're not indexable in the same way. Sorting, like the ORDER BY, doesn't work so well with them. But if you're putting in a blog post or a comment in Facebook, you'd probably put one of these things in. They all have a maximum length. The TEXT field is up to 65,000 characters, because it might be a small blog post, or MEDIUMTEXT, or LONGTEXT, you can have 4 gigabytes of text, of characters. Now, because this is character-set-aware field, they're real characters. And so, 65,000 Latin characters is the same as 65,000 Asian characters. And it's capable of handling all that. So the TEXT field and the VARCHAR field understand character sets. Now, the types that are very rarely used are byte types. And so, in the Latin character set, using ASCII, a character is 8 bits, that's just how big it is. And so, if all you're doing is Latin, one character is 8 bits. Whereas, if you're doing other Unicode things, characters can be up to 32 bits, whereas a byte is exactly 8. And so, you can tell, if it's binary data and you know, really, that it's only up through values from 0 through 255 characters, you can say, hey, I'd like a binary stuff, like a BYTE field or a variable length binary field, up to that many bytes. It's not really commonly used if you don't want to index, if you don't want to sort it. It's sort of not aware, it's very unaware of its content. But sometimes you're putting stuff in, you're pulling something off, perhaps a sensor or something, and you're just putting it in, and it's just a bunch of 0s and 1s. That's what this is used for. You can also store things like images or PDFs or videos in databases. It turns out that databases are pretty good at this, and these are called BLOBs, binary large objects. And they also have various lengths, depending on whether they're tiny, regular, medium or long. So it turns out the databases are really good at storing this stuff, but the problem you run into is that it starts to slow your database backup down quite a bit after a while. And so, what you find is that if you're doing medium-size things, like maybe a picture here and there, or a profile photo or something, we tend to put that in the database because then it's kind of with all the rest of the data about you. But if we're doing things like videos or large documents, we tend to find some other way to store them inside the computer, usually just as a file somewhere, and then we serve them up. But it's not bad to store binary objects like this in a database, except when it comes to your backup getting too large. Воспроизведите видео, начиная с :4:21, и следуйте текстовым инструкциям4:21 Integers, there's different sizes of integers, and you say, why? And the answer is well, we're going to have millions of these things. And so, if your integer is only really a number somewhere between 1 and 15, we don't need to store the same amount of space as if it's a 2 billion number, right? And so, we have tiny integers, small integers, normal integers, that's 0 through 2 billion. Like the INT is a 32-bit integer, if you're sort of nerdy like that. And then there's even larger big integers that you can store that take up more space. Integers are nice, especially the INT integer, the one 0 through 2 billion. They sort really fast. They take up not that much storage. They're easy to compare. They're easy to sort. They're used for indexes. And so, we tend to use integers in a lot of things. And we tend to prefer them over strings, as we'll soon see representing information. Floating point numbers. Floating point numbers work like all programming languages. If you've been using things like 98.6 or blah-blah-blah.14 or something, 6.02 x 10 to the whatever, you've been using floating points inside computers all along. The key thing about floating point numbers like 1.7, they're not perfectly represented because they're not really real numbers. They're floating point numbers, but they're approximations. So the way to think about floating point numbers, FLOAT is the small one, it's a 32-bit floating point number. It has a range of up to 10 to the 38th power. But the key is, is that no matter how big or small it is, there's only seven digits of accuracy. And so for things like temperatures or speed or things like that, we tend to find floating point numbers are perfectly good, because can you really measure speed to more than seven digits of accuracy? If you can do better than that, then you go up to 14 digits of accuracy. But after a while you're like, okay, 14 digits of accuracy is probably enough. You just have to understand that you don't want to store money in a floating point, because $10.25 is not perfectly represented, and so you get inaccuracy. So you actually use scaled integers to store money, it turns out. But for scientific calculations, and you put plenty of scientific data into databases, floats and doubles are great. Воспроизведите видео, начиная с :6:41, и следуйте текстовым инструкциям6:41 There's a number of different time and date formats. There's a thing called the TIMESTAMP. And the TIMESTAMP is the number of seconds since 1970, and it's stored in a 32-bit number. And it's really efficient and easy to sort, because it's really an integer. But it has a couple of problems. It can only handle seconds, it can't go below a second. And it has an absolute sort of length. Being that the TIMESTAMP idea was invented in 1970-something, they said, it's the number of seconds since 1970, January of something 1970. Which means that we have kind of a Y2K problem in Unix systems and in database systems in 2037, because that's when the 2 billionth second after January of 1970 happens. But until then, they're fine. And then we'll make it a 64-bit number, and then we'll have until the sun dies or something like that, so we don't have to worry about that too much. So, if a data item that is only per second, like when a record was updated or created, we tend to use TIMESTAMP for all that. DATETIME is more general, takes up a little bit more space, and it can really represent any year, like the year 1300, you can still represent them. because you can't put 1300 in a TIMESTAMP, because it's before 1970. So DATETIME takes up a little more space, but literally any number that you can represent in this format, and it's beyond 2037 and it'll be fine. And the same is true for a DATE which is just any set of year-month-days that fit, and a TIME, that is hours, minutes and seconds. And there is a built-in function in MySQL that tells you the current date. So it would be, INSERT created_at value NOW(). So, NOW is a way to say what time is it now, as understood by the database server. So up next, I'm going to talk to you a little bit about how to give even more detail not just about what kind of data is in each column, but something about how you intend to use that data in each column. So we just got done talking about whether it's an integer or a character, how long it is, whether it's like a gif image or whatever. Now, we're going to talk, additionally, not just about what's in the columns but how we're going to use those columns. And so, this whole thing of creating tables and defining columns has to do with how we're going to use these things. And now we're going to give it some more detail. So, here we're going to drop this table Users and we're going to create a brand new table and I'll focus first on this User ID column. And let me just sort of foreshadow upcoming lecture where we start having two tables and we got to connect one row in this table to another row on this table. So, this table has like row one, two, three, four. And then this other table has a row that we want to point at row four. So we do this by just putting four in there. And so these are called keys and these are called primary foreign keys. We'll get to all that. And so, the mechanism is we find that we want to enter rows into tables and have just some number. We don't care what the number is, as a matter of fact, we can let someone else come up with a number. And that's exactly what auto increment does. Auto increment is a way to tell the database that we're going to make an integer column and so if we look at this User ID column, it's an int which is short for integer. Unsigned meaning it's positive only. Not null which means it's required and auto increment means please supply it if I don't. And the way it supplies it is it starts with like one, two, three, four. You can insert a row and the database knows all about this stuff and so it just like oh four, next one's five. I know how to do that and so we don't have to in our programs keep track of this stuff. So this is a way to say that. And then we have the two fields, name and email, Varchar(128). So which we did exactly like we did before. And now what we're also going to tell it something about. Remember, I told you about these indexes that were like shortcuts to various places on desk. We're telling, with these next two statements, something about the data and how we're going to use them. So what we're saying with primary key for User ID? We're saying this column is something we're going to use a primary key which means we're going to use it a lot and you better be able to look it up really fast and you better throw some extra stuff on the desk. Extra bread crumbs here and there, so that we can get to that super fast. Okay? So, User ID is a number and I want a very good index and I want it really fast lookups. So the other one, this index, what we're saying there is that we're going to actually look up with where clauses using this a lot. Like where e-mail equals c7emashdot80u or even like clause and we might actually sort this a whole bunch and so we're telling it didn't change what we're going to store here, it's changing what we're going to expect or how we're going to use it. And then these are like hints. All of these are like hints, right? There are also rules. Primary key means it's got to be unique et cetera, et cetera, et cetera. But there also hints to say hey database while we're storing this data, later, I'm going to look at this e-mail address up a lot. I'm not going to look it up by name. I'm really going to look up in this table by e-mail address, so if you're going to make an index, go ahead and make the index on e-mail. I don't know if you're going to make it fast or slow but I'm just giving you a hint, right now, that we're going to use the where clauses with that e-mail and look strings up in there. So please be efficient. But you didn't tell it explicitly what you wanted to do. You just said later I'm like be doing stuff with this. So let's go ahead and create that table. Let me just copy this. Create table statement and go over here and let's see what's in here. Okay, we don't have a table. So I can make the table. If you did use drop table or get rid of it with a little couple of clicks. So, it's an auto increment field, this is another column and this is going to be our primary key for this row. And then we're going to add an index, a character based index, on the name field. And so ultimately, if you look at the structure of this table that I just created, it works pretty much the same as all those other ones. We still have name and email but you see the little key that's kind of telling us that's a good thing to be looked up on and primary key is that's like it knows this and it knows that it's auto increment et cetera, et cetera, et cetera. So if I for example go back in now and I insert a bunch of records. Let's go back to those inserts statements because you'll notice that I don't have to put User ID in. I just add a column called User ID and I don't have it here or here. So watch what happens. Come back. I'm going to insert four records and never specify the User ID. That's what auto increment does. As if I don't specify it, provide it for me. Oh and by the way, it's got to be unique. So I just did that. Let's take a look. Look at this. It just put them in. I can later, when we do this in PHP, I'll be able to pull this number back. I don't say what number did you give me? But it actually assigned all those things for me automatically. That's what auto increment is. Really powerful and when we get to the second major part of this. When we start talking about joints and multi-table stuff. This is going to be essential. But auto increment for now is just a way to make a column that's automatically set up. There's a lot of built-in functions like we talked about now, so you don't have to really know in PHP what the data is. You just say now and then say oh created at is a date time field and insert now into there and it works all the time. There's cool string functions etc etc.. I don't teach too many of these other than now because you can go find and stack overflow. You like what's the PHP, what is the mySQL function to take the first three characters of something and you like to cut and paste. You're done. So like I mentioned, indexes, if you don't tell it to index something and you select something, it may have to scan the entire table which is like tun tun tun tun tun tun and it's like oh that's going to take a while. If there's hundreds of thousands of records, these things can slow down. But the indexes, like I said, a trick based on, you know, it's basically a table of contents, right? So I'll go back and draw that picture again. Right. So the data is stored here and here and here and here and you've got to move in and out and wait for the thing to come around or you store a little bit table called the Index which is a shortcut to each of these records. And this is a smaller amount of data and reads this and says oh four, I'll read the index and I'll go straight to four. Rather than going one two three four. So it's like a set of shortcuts and that's grossly oversimplifying. These are amazing techniques and people have done Ph.D. thesis on how to do this better. This picture is the simplest of them. And so there are a whole Ph.D. thesis's about how to do this on a disk drive, how to do it on multiple disk drives, how to do it on SSD drives. That's the beauty of database work is that thousands and thousands of Ph.D. thesis's in Computer Science have been written to make this software fast and you just say select. And it all works. There's a couple of different kinds of indexes. There's a Hash or Tree. The Hash is used often in primary keys for exact matches and the Trees are used for sorting and prefects matches. But the one I showed you before, I didn't even tell what kind of index. The primary key index was I showed you that and the other one, it's very little space, exact match, no duplicates. But it's super fast for integer fields and you'll notice that that primary key that I put on wasn't an integer field. The index that I used on the other one, either Hash or B-Tree, is good for prefix lookups. The B-Tree is best for prefix lookups. Most people tell you to not even tell it to use Hash or B-Tree and let it adjust sometimes based on the data. This is kind of the internal structure of the B-trees. So remember I told you that you know you have data that sort of scattered all over the place. Or another way to think about it. Let's draw this picture a little bit differently, right? So, here is data and then there's a record here and a record here and record here, it might take you some time to scan through it. So you make this little index that's like a shortcut because you can go randomly. You could go sequentially or you could go randomly. So you have little index and it sort of points to this and these are shortcuts. So you save, you skip all this and skip all this and skip all this. And so B-Trees are one form of index, not the only form of an index. But the idea is there's a small amount of data that points to the larger chunks of data. And so this is a picture of what that might look like. So you can think of this as little block of data that has a series of numbers. The data is basically sorted. And what we say is we have a little, we store a little shortcut to another block of data and anything lower than seven is stored in that block of data. And anything between seven and 16 is stored in this block of data and anything above 16 is stored in this block of data. So if I'm saying oh I would like to look up nine. It comes, it reads this. Okay. And says oh that's, then it reads this and then it reads the nine one, which is out here and so it takes three instead of you know however many it would have had to go the other way. Now when you insert data in something like this. Let's say we're going to insert four. So, we'll put four in here. So we're going insert four. So what happens? This will get a little messy. Is four goes down here. And you look in here and it belongs right there. And it's like oh drat. And so what it does is it sort of moves everything. Now it might split this, so I'll just say let's split it, right? So I'll do this. One two three. One two four and another one that's five six. And then what it will do is it'll say you'll make a five here and then everything below five goes here. Everything five and above, between five and seven goes there. So it split this block into two. Now, you don't need to know this. It sort of has to adjust these things but when it's done adjusting, then you still have a fast way to get all this stuff. And do you really want to know this? And the answer is no, I don't want to know this. Somebody in Computer Science figured this all out. And actually, this is too simple. I mean there's better ways to do this but you don't need to know that. You do need to know that B-trees are good for sorted kind of material and prefix material, especially like a string. If you're looking up names and you're looking up names that started with Ch. You still could find the right spot and then you'd scan for all the Ch's basically. And so that's what B-Trees are good for. Hashes are a little more complex to understand. They use clever math. They use this thing called a Hash function and feel free. It's the thing in Python that makes dictionaries really fast and what it does is it basically does a simple calculation based on the key, like a thing you're looking up, like an e-mail address or a name. And that produces a number. So it runs a little thing, it might look at all the characters and add them up or divide by two or do some combination. But when it's all said and done, it gives you a number say from 0-15 and then based on this calculation, you say oh this one is three and so or actually this one is two and you calculate and you go and you can store it in a slot. And so this didn't take any talking to the disk drive. You just calculate this function. So you go right to the same place. Now the problem with hashing, just like I showed you in the last one, is you can't have more than one key hash to the same place then you do things like have little extra overflows or you split these things, et cetera, et cetera, et cetera. And so, hashing sometimes takes a little more time to get it reorganized. But then it's super fast because you don't even have to read that. You do the hash and you kind of read the disk a very few times. Okay? So Hashes and B-Trees are two different kinds of indexes as we saw in the very beginning. We can indicate what kind of an index we want to put on the various columns. This is a hint. It really doesn't change the syntax that you use but it's a hint. So we've said that this is going to be sort of a Hash-based super fast integer based index. This is more of a string based sorting prefix lookup kind of index. And what's really cool about it is if you forget to put these two things on and you find out later that your application is not running well, you can all come back with an SQL command to add the index later and sometimes there's even tools that will watch your database and say you know what, you should probably put a B-Tree index on e-mail. And like I mentioned, some of my colleagues say you should never say what kind it is and let it figure out the best thing. This part about using B-Tree or using Hash, you might want to say it because if it's strings, probably you want B-Tree. Otherwise you want Hash. But you can also just let the database decide that. What we're really saying is that I'm going to be looking stuff up with where clauses a lot with this email. And so I want you to store that Mr. Database as efficiently as you possibly can. So that's a quick zoom through SQL. And we sort of set up the shape things. We set of rules about the data. We give hints about how we might use that data. Create, read, update and relate. And it's not that hard. It really isn't that hard. And so the next thing we're going to talk about is how we make more than one table and start connecting those tables together and that actually is the real performance gain that we get. Right now, we come up with a very formal way to store data. But it's not really the thing that makes it go super fast, although indexes are cool but joints are the really cool thing and we'll talk about that next. 

Comments

Popular posts from this blog

SQL workshop

Introduction to Structured Query Language (SQL)

Introduction to Structured Query Language (SQL) My name : MODESTAS contact: GimtaMarkWell@execs.com coursera: 10.26 Mysql user name : modeirmode postal address: Perkaso takas XI KAUNAS Lithuania 2020 Programm language Should be PHP https://www.facebook.com/groups/2071393769785084/permalink/3046755458915572/