Auto-incrementing primary keys are useful for several reasons.
Auto-incrementing primary keys are useful for several reasons: They allow duplicate user names as on Stack Overflow They allow the user name (or email address, if that's used to login) to be changed (easily) Selects, joins and inserts are faster than varchar primary keys as its much faster to maintain a numeric index As you mentioned, validation becomes very simple: if ((int)$id > 0) { ... } Sanitation of input is trivial: $id = (int)$_GET'id' There is far less overhead as foreign keys don't have to duplicate potentially large string values I would say trying to use any piece of string information as a unique identifier for a record is a bad idea when an auto-incrementing numeric key is so readily available. Systems with unique user names are fine for very small numbers of users, but the Internet has rendered them fundamentally broken. When you consider the sheer number of people named "john" that might have to interact with a website, it's ridiculous to require each of them to use a unique display name.It leads to the awful system we see so frequently with random digits and letters decorating a username.
However, even in a system where you enforced unique usernames, it's still a poor choice for a primary key. Imagine a user with 500 posts: The foreign key in the posts table is going to contain the username, duplicated 500 times. The overhead is prohibitive even before you consider that somebody might eventually need to change their username.
11 Tell 'the professor' to read this answer ;) – subosito Nov 5 '10 at 3:49 6 Another point - ORMs hate non-integer keys. Whilst I feel this is a general failing on the part of most ORMs, you do learn to work with the tools – Phil Nov 5 '10 at 3:53 1 @meagar I hate visiting sites that require me to come up with a unique username for log in. I'd like to abandon the "username" for authentication entirely and move to email address only.
– Phil Nov 5 '10 at 3:58 1 @meagar: How do you uniquely identify a user if they don't have a unique name? I don't know many systems where I am expected to identify myself with a number. Usually I have a unique login name.
– sqlvogel Nov 5 '10 at 8:01 3 Non-unique displayed usernames faciliate social engineering - I wouldn't call that such a great idea. – Michael Borgwardt Nov 5 '10 at 8:11.
If the username is the primary key and a user changes his/her username, you will need to update all the tables which have foreign key references to the users table.
3 If the user can change his/her/its username then the username is by definition not a candidate primary key. – Bob Jarvis Nov 5 '10 at 11:01 2 The referential action ON UPDATE CASCADE is the usual answer to this 'straw man' argument. – onedaywhen Nov 5 '10 at 11:49 2 @Bob Jarvis: a property of a good key it that the values are stable.
Immutable is the ideal but rare changes only is good too. – onedaywhen Nov 5 '10 at 11:51 1 @onedaywhen: If you use ON UPDATE CASCADE, instead of changing the username in one place, you're changing it in potentially huge numbers of indexed fields. Meaning a massive performance hit.
And of course, you'll want to do it in a transaction, lest someone see a record that's linked to a username that no longer exists. In many DBs, this means locking the related rows. Granted, a user shouldn't be changing their username often.
But if two people do so at once, and they share any records in common (say, intra-site messages), you're looking at potential issues all over the place. – cHao Nov 8 '10 at 20:52 Surrogate keys do not eliminate problems related to key changes - they make certain problems more manageable but make others more complex. The statement that a changing attribute is not a candidate key is certainly false.
Any attribute can change, there are no exceptions - even for surrogates. By definition a candidate key is only required to be unique and irreducible - stability being a useful and important property but not an essential one. – sqlvogel Nov 8 '10 at 23:25.
This is typically called a surrogate key and it has many benefits. One of which is insulating your database relationships from the application data. More details and the corresponding disadvantages can be found at the wiki link provided above.
If you have demonstrated to your professor that assigning a unique arbitrary integer to each user is of value to your application then of course he would be wrong to say that it is "completely useless and unnecessary". However, maybe you missed his point. If he told you that the requirement is that "no two users can have the same username" then you haven't met that requirement.
Sincere thanks for posting your SQL DDL, it is very useful but most don't bother on SO. Using your table, I can do this: INSERT INTO users (username) VALUES (NULL); INSERT INTO users (username) VALUES (NULL); INSERT INTO users (username) VALUES (NULL); INSERT INTO users (username) VALUES (NULL); INSERT INTO users (username) VALUES (NULL); Which results in this: SELECT uid, username, passhash, email, rdate FROM users; uid username passhash email rdate 1 2 3 4 I think is the point your professor was trying to make: without enforcing the natural key on username you don't really have any data integrity at all. If I was the prof, I'd also urge you to remove nullable columns from your design.
Thanks, I did forget the NOT NULL and UNIQUE constraint. Here's an updated version: CREATE TABLE users( uid INT NOT NULL AUTO_INCREMENT, username VARCHAR(30) NOT NULL, passhash VARCHAR(255) NOT NULL, email VARCHAR(60) NOT NULL, rdate DATE NOT NULL, PRIMARY KEY(uid), UNIQUE (username,email) ); – CarlosMarx Nov 5 '10 at 22:51 UNIQUE (username, email) doesn't do what you think it does. You might try testing your own code.
– Catcall May 25 '11 at 17:02 @Catcall: that last comment was for @CarlosMarx, right? – onedaywhen May 25 '11 at 18:20 Yes, that was for @CarlosMarx. Sorry for not making that clearer.
– Catcall May 25 '11 at 19:00.
Because someone might want to change their username (or any name for that matter).
I'll need someone with more database knowledge to back me up on this one, but I believe you get a faster response in foreign key lookup time. Additionally, you may decide later that you want usernames to change, or that the requirements for usernames may change (maybe a longer string? ).
Using an ID prevents having to change all foreign keys. Lets face it, most projects aren't going to expand that much, but do you really want to risk the headache 12 months down the road, when you could conform to good programming standards now?
1 VARCHAR(4) takes as many bytes as INT; less that that is smaller than INT but that doesn't make it feasible depending on the rules. – OMG Ponies Nov 5 '10 at 4:15 Of course this is varchar (60) so it almost certainly will be slower. – HLGEM Nov 5 '10 at 15:58.
Your professor is doing the right thing by pointing out that you should have made username unique and not nullable if it was a requirement that user names should be unique. The uid could be a key as well but unless you are actually using it somewhere then it isn't needed. The more important aspect of the design ought to be to implement the natural key.So I agree with your professor's comment.
2 I agree with him that it should havea unique constraint, but it is almost always a mistake to use somethign so changeable as a username as PK. Integers are known to be be faster for joins than varchars of this size and updates to potentially millions of FK records when the user name changes can bring your system to a screaming halt. Someone teaching design should know better than to want a natural key for something this changeable.
– HLGEM Nov 5 '10 at 15:58 @HLGM: I've been on networks and such for almost 3 decades. I can't recall my username ever changing. – Catcall May 25 '11 at 17:03.
For instance, integer search (?id=5) is much way faster and has higher cardinality than string search (?username=bob). Another example, uid is auto_increment, so you don't have to insert it explicitly but it will auto increment in each insert query. PS: Your prof is soooo wrong about it :D.
1 If they are unique then the carnality of the string or number would be the same. If a hash index is used in each case then the seek performance would be about the same in each case. So on the information given there is no basis for saying one is faster than the other.In any case, the point of the question is correctness, not performance.
The professor is sooo right! – sqlvogel Nov 5 '10 at 12:34 @dportas, so are you the professor in question? The professor is so wrong I would fire him if he tried to do something this massively stupid to my databases.
– HLGEM Nov 5 '10 at 16:07 1 @HLGEM: I'm not him, but I might consider firing someone who was so stupid as to permit duplicate users in a user table - or at least I would re-educate them by making them define natural keys before surrogates. I expect the prof feels the same way. – sqlvogel Nov 5 '10 at 16:21.
We use ID to prevent duplication data and it can make some procces become not complicated (if we want to update or delete data), it more simple if we use ID. If you don't want to use ID you can use another fields. But don't forget to make them become UNIQUE.It can make your data become preventive from duplication data.
Another way outside PRIMARY is UNIQUE.
I go with all the answers above. I would say an ID is easy to implement and when it comes to indexing, Int is always preferred compared to a varchar. Your professor should know better, why would he say no to Int id is above me!
Because userid is supposed to be unique (cannot be duplicated) & sometimes is index.
I would never consider using a natural key that I might want to encrypt someday (or want to encrypt now).
The hash ought to make a good a good key. If you disagree then what natural key would you use to uniquely identify users? I think we agree that the original design is wrong but what suggestions do you have for the OP?
– sqlvogel Nov 5 '10 at 16:46.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.