You are going to make mistakes with PHP, 100% guaranteed. PHP is procedural. What you want are declarative constraints.
You want to tell the entire stack: "These are the constraints on the data, and these constraints cannot be violated. " You don't want to much around with "Step 1 ... Step 2 ... Step 3 ... Step 432 ...", as your method of enforcing constraints on data, because you're going to get it wrong when you change it later, you will forget what you did now nobody else will know all of these implicit constraints like you know them now, and that includes your future self it takes a lot of code to enforce constraints properly and all the time - the database server has this code already, but are you prepared to write it? The question should actually be worded, "Why should I use PHP to enforce these constraints, when I could just do it with MySQL?
Lol, exactly. :) – BobbyShaftoe Dec 19 '08 at 21:55 1 How do you interpret the exception thrown into a meaningful error message (without adding more code than you gained in the first place)? And do you disbelieve in validation in the client?
– le dorfier Dec 19 '08 at 22:02 2 That's not what this is about, validate all you want, but if you've forgotten anything you're not risking the integrity of your data, just a crappy error for the user. – ninesided Dec 19 '08 at 22:50 1 Security is implemented in layers, and so should data integrity. Client validation can be bypassed, then the PHP should validate that.
Database integrity constraints should catch the rest. – Calyth Dec 19 '08 at 23:21 3 If you want to catch errors, figure out what's wrong, and display meaningful information to the user, then by all means also implement checking and validation in PHP. But if that's the only place you implement it, you will get it wrong.
– Justice Dec 19 '087 at 14:37.
You can't "just" do it with php for the same reason that programmers "just" can't write bug-free code. It's harder than you think. Especially if you think it's not that hard.
Well .. if you cannot write down some bullet-proof insert\update code, you should concentrate on gaming software instead of database-management programs – Philippe Grondier Dec 19 '08 at 22:45 7 Gaming software is probably harder. In any case, I'll be out of a job as soon as the world discovers Philippe Grondier's bug-free code. – recursive Dec 19 '08 at 23:02.
If you can swear for the life of you that nothing will ever access the DB though any other means then your (of course bug-free) PHP page, then doing it with PHP alone will be fine. Since real-world scenarios always contain some uncertainty, it is good to have the DB server watching the integrity of your data. For simple databases, referential integrity constraints might not be an absolute requirement, but a nice-to-have.
The more complex the application gets, the more benefit can you draw from them. Planning them in early makes your life easier later. Additionally, referential integrity does it's part in forcing you to design the database in a more by-the-book manner, because not every dirty hack is possible anymore.
This is also a good thing.
They are quite important. You don't want to define your model entirely through PHP. What if there is a bug in your PHP code?
You could easily have null'ed columns where your business rules state you should not. By defining it at the database level, you at least get that check for free. You're going to really hate it when there are bugs in your PHP or if any other tool ever uses your database.
You're just asking for problem, IMHO. Be advised, this is the very short version of the story.
It's important to implement constraints in the database because it's impossible to predict the future! You just never know when your requirements will change. Also consider the possibility that you may have multiple developers working on the same application.
You may know what all the constraints are, but a junior developer may not. With constraints on the database, the junior developer's code will generate an error, and he'll know that something needs to be fixed. Without the constraints, the code may not fail, and the data could get corrupt.
I'm usually in favor of declaring constraints in the database. Arguments for constraints: Declarative code is easier to make bug-free than Imperative code. Constraints are enforced even if app code contains bugs.
Supports the "Don't Repeat Yourself" principle, if you have multiple applications or code modules accessing the same database and you need business rules to be enforced uniformly. If you need to change the constraint, you can do it in one place, even if you have many apps. Enforces data integrity even when people try to bypass the application, using ad hoc query tools to tinker with the database.
Enforces consistency which means that you can always be certain the data is in a valid state before and after any data update. If you don't use constraints, you may need to run periodic queries to check for broken references and clean them up. You can model cascading update/delete easily with constraints.
Doing the same thing in application code is complex and inefficient, cannot apply changes atomically (though using transaction isolation is recommended), and is susceptible to bugs. Constraints help databases be more self-documenting, just as column names and SQL data types help. Arguments against constraints: More complex business rules cannot be modeled by declarative constraints, so you have to implement some in application space anyway.
Given that, why not implement all business rules in one place (your app) and in the same language? This makes it easier to debug, test, and track code revisions. Constraints often involve indexes, which incur some amount of overhead during inserts/updates.
On the other hand, even if you don't declare a constraint, you probably need an index anyway, because the column may be used in search criteria or join conditions frequently. Constraints can complicate your attempts to "clean up" mistakes in the data. In your current project, the incompatibility of MyISAM vs. InnoDB with respect to referential constraints is causing some grief.
The most important thing about using NOT NULL to me, is more the documentation part. When I return to the project after a few months I forget which columns it is acceptable to have nulls in. If the column says NOT NULL, then I know I will never ever have to deal with potential null values from it.
And if it allows null, then I know for sure I have to deal with them. The other thing is, as others have noted.. You may miss something somewhere, and cleaning up data sucks, or may be entirely impossible. Its better to know for sure that all data in your database is consistent.
Enabling these constraints in MySQL takes almost zero time. If they save you from even a single bug due to faulty PHP or other code, isn't that worth it? Keep in mind that the sorts of bugs you'll save yourself from can be rather nasty.
Finding and fixing the bug itself may not be hard; the nasty part is that once you've fixed the bug you'll be left with a bunch of faulty data that may not even be salvageable. I wouldn't even approach this problem from the "well, something other than PHP might access your data someday" angle. That's true, but even more important in my mind are the the headaches, time (money) and data loss that you can save yourself simply by adding a few simple constraints.
I've asked people how to enable foreign keys in MySQL, and it is rather difficult. Both tables must be InnoDB, which I am told is a bad idea. Also, the table that's being referenced has already been created, and is holding data.
Can anyone help? – stalepretzel Dec 19 '08 at 22:06 Why is it a bad idea? Anyway, "ALTER TABLE tablename ENGINE=InnoDB" can convert a populated table from MyISAM to InnoDB.
Although it takes time depending on the amount of data. – Bill Karwin Dec 19 '08 at 23:11.
Use the database for structural data integrity, and use the BR layer for the rest. And catch errors as early as possible. They work together.
With luck, when your code as matured, you won't experience databse RI errors; and you can proudly announce yourself to be the first.
That's it! So I'm not the only one to think this way... – Philippe Grondier Dec 19 '08 at 22:42.
Having your data tier enforce data consistency through constraints helps to ensure your data remains consistent and provides cheap runtime bug checking within your application. If you think constraints are not worthwhile you either have a small/non mission critical system or you are passing up a huge opportunity to improve the quality of your system. This cannot be understated.
Choices include: choosing a different RDBMS, reinvent your own metadata system or manually manage constraints. Manual management in queries without a metadata system quickly becomes infeasible to maintain and audit properly as schema/system complexity grows and unecessarily complicates an evolving schema. My recommendation is to choose a different RDBMS.
Consistency checking is much harder than you may think. For example MySQL uses transactional read consistency which means the values you are checking against may not be the same values in the scope of another transaction. Consistency scemantics for concurrent access are very very difficult to get right if not bound directly to the data tier.
When all is said and done, even with a modest amount of effort put into manual checking, the likely outcome is that one would still be able to drive a truck through the corner cases you have failed to consider or committed an error in forming. On your NOT NULL question... The obvious data field requirements are a good starting point. Here are a couple of other things to consider when defining column nullability.It provides a guarantee that can very helpful when writing queries.
Various joins may use NULL conditions to show a non-match of a table row separate from a NULL value that cannot be assumed if the condition allows nulls. (If NULLs are allowed, a match can mean either the row did not match or the row did match but the column value is null.) The use of NOT NULL also helps define the rules for simpler queries matching values. Since you cannot say "WHEN value1 = value2" if both value1 and value2 are NULL the result of the evaluation is still false.
I don't think you can be certain that your database will only be accessed by PHP and if so, by developers who will use it to respect those constraints for the entire lifecyle of your database. If you include these constraints in your schema, then one can get a good idea of how the data is used and related by investigating your schema. If you only put all that in the code, then someone would have to look in both the database and the PHP code.
But shouldn't that stuff be in the design documentation, data dictionary, and logical database design? Yes, but these documents are notorious for getting out of date and stale. I know you would never allow that to happen, but some people who have experience with projects with less discipline may assume this about your project, and want to consult the actual code and schema rather than documentation.
I'm afraid this is a Religious Topic. From a puristic point-of-view, you want the database to do the referential integrity. This is ideal when you have a multiplicity of applications accessing the database, because the constraints are in one place.
Unfortunately, real world! = ideal. If you have to enforce some sort of referential integrity, in my experience, your application Will Need To Know How To Do This.
This is regardless of whether it is the final arbiter, or the database checks it as well. And even if the database does do the referential integrity, then the application has to know what do if the database rejects an update because referential integrity would be violated... As a sidenote, setting up MySQL to support foreign key constraints is a bit of a process because you need to shift to InnoDB. If you do just that, you can get a lot of performance back by setting innodb_flush_log_at_tx_commit to 1.
But it probably would be better if you can instead re-engineer your site to be transaction aware. Then you get two benefits of InnoDB.
Even if your PHP code is perfectly bug-free, it may stop mid-script (out of memory error, segfault in some library, etc), leaving half-inserted stuff in the database, hence the importance of using InnoDB and transactions. Same for constraints, of course you should have proper form validation, and database constraints behind it to catch bugs. Database constraints are easy to specify, finding bugs in the application is hard, and even harder without constraints.My experience has been that improperly constrained databases, and anything that uses MyISAM, WILL have inconssitent data after a few months of use, and it is very hard to find where it came from.
I highly appreciate your question, as I am deeply convinced that default-value rules should be implemented on the code-side, not on the database-side, and this for a very simple reason: when users are the one that initiate database changes (INSERTS, SELECTS and UPDATES), these changes shall integrate all business rules, and default values are basically business rules: There is no invoice without invoice number There is no invoice line without a quantity, and 0 or nulls are not acceptable There is no incoming mail without date of reception etc We have decided a few years ago to get rid of all these "database-side" artefacts like "not null", "(do not) allow empty strings", and other "default value" tricks, and it works perfectly. Arguments in favor of the default value mainly refer to a kind of "security" principle ("do it on the database side because you will forget to to it on the code side / your language is not made for that/it's easier to do it on the database side") that does not make any sense once you have chosen not to implement any default value on the database side: just check that your business rules are properly implemented while debugging. For the last 2 years, nobody in the team ever thought of declaring a default value in a table.
I guess that our younger trainee does not even know about something that is called "default value". EDIT: rereading some of the answers here, my final comment would be: do it on any side, either DB or code, but make your choice and do it on one side only! There is nothing more dangerous than having such controls on both sides, because eventually (1) you'll never know if both sides are really implementing the same rule, meaning that (2) checking the rules will mean checking both sides, which can really become a mess!
The worst situation is of course when one part of the job is done on the database side (ie the rules that were identified when the database was created) and the other part (ie the newly identitified rules) done on the client side ... nightmare ....
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.