Thursday, August 25, 2005

 

Constraints vs Rules

Most organisations take data integrity very seriously. Indeed, it is Codd's 10th rule of relational data management. So the question isn't whether or not to manage the integrity of your data, but rather "how".

In most situations, I advocate managing the integrity of your data at the server level, as opposed to the application or procedure level. Using the server's data integrity is generally the most efficient and reliable way to maintain the integrity of your data.

In Sybase ASE, there are at least two good ways to manage the integrity of your data: Constraints and Rules. Usage for both are covered in the Transact-SQL User's Guide.

Constraints are covered in Chapter 7: Creating Databases and Tables
http://manuals.sybase.com/onlinebooks/group-as/asg1250e/sqlug/@ebt-link;pt=19279?target=%25N%15_20741_START_RESTART_N%25

Rules are covered in Chapter 12: Defining Defaults and Rules for Data
http://manuals.sybase.com/onlinebooks/group-as/asg1250e/sqlug/@ebt-link;pt=19279?target=%25N%15_37302_START_RESTART_N%25

Let's look first at an example of a column-level constraint.

create table atable (a int check (a > 0), b int constraint b_cons check (a > b))

insert into atable values (0, 0)

Check constraint violation occurred, dbname = 'test1', table name = 'atable', constraint name = 'atable_a_1771473076'.

Since we didn't assign a constraint name, we got an rather uninteresting name for our constraint. Normally we'd have to run a query like the following along with sp_helpconstraint or a look at syscomments to figure out what the constraint does.

select o1.name constraint_name, o2.name table_name, col.name column_name
from sysobjects o1, sysobjects o2, sysconstraints c, syscolumns col
where o1.id = c.constrid
and o2.id = c.tableid
and col.id = c.colid

That's why I prefer naming the constraints, like we did with the second one.

insert into atable values (1, 1)

Check constraint violation occurred, dbname = 'test1', table name = 'atable', constraint name = 'b_cons'.

If you're still not happy with this message, you can write your own and bind it to the constraint.

sp_addmessage 25001, "The b value must be greater than the a value"
sp_bindmsg b_cons, 25001

In the Reference Manual, Volume 4: Tables, you will see that you can also use: in, between, like and standard comparisons (<>=)

Now we saw that we can compare two columns together, let's try something more sophisticated.

For slightly more advanced constraints over multiple columns, we can use table-level constraints instead.

create table atable (a int, b int, constraint atable_cons check ((a > 0) and (a > b)))

Check constraint violation occurred, dbname = 'test1', table name = 'atable', constraint name = 'atable_cons'.

The two primary advantages to using Constraints over Rules are better performance and better portability.

Since constraints are binded directly to the table, the constraint itself is generally in memory. A rule may have to get fetched from the cache or, worse yet, the disk.

The reason constraints have better portability is that they are ANSI SQL standard. Every ANSI-compliant database should support constraints. Rules are an add-on.

Let's take a closer look at rules. The syntax and description for rules can be found in Reference Manual, Volume 2: Commands.

Rules are stand-alone objects. Once they are created, we can bind them to as many tables, columns, and procedures as we like. Here's an example.

create rule a_rule as @a > 0

create table atable (a int, b int)

sp_bindrule a_rule, 'atable.a'

A column insert or update conflicts with a rule bound to the column. The command is aborted. The conflict occured in database 'test1', table 'atable', rule 'a_rule', column 'a'.

One of the disadvantages of rules is that they can't reference multiple columns in a table like our constraint could.

create rule b_rule as @b > @a

Rule contains more than one variable.

So when should you use rules instead of constraints? The main reason is when you have an identical restriction on many tables, it might make sense to share a single rule rather than create separate constraints. If the rule changes, you only need to update it in one place, and you're assured that consistency.

There are other ways to maintain your data integrity. The other method to keep foremost in your mind is by using triggers. Triggers can handle much more advanced cases, because they can contain subqueries and reference other tables. So for advanced constraints, look to triggers.

In Summary:
- Constraints generally perform faster than rules
- Constraints can access multiple columns in a table
- Constraints are ANSI-SQL standard
- A single rule can be applied to multiple tables to assure consistency
- Triggers are most effective for very advanced data integrity conditions

Note: There are two new posts, the other one is on Compute

Comments:
Awesome stuff. How do I convince a client that this is better than embedding all kinds of validation in their SQL? It's like they don't understand that the concept of encapsulation applies to DB work. Argh!
 
Then prove it to them. It should be easy to build a test case demonstrating the greater speed of checking constraints at the db-level as opposed to the procedure-level.

And if that doesn't convince them, you can ask them, point-blank, do they trust ALL of their developers to check ALL appropriate inputs before EVERY insert, update and select?

Even if they do, it takes time for developers to consider what is appropriate and make the proper checks (and it bloats the code).

Instead, do it once, and do it right. Do it when you are designing your data model. Do it at the database-level.
 
This comment has been removed by a blog administrator.
 
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?