Dependency Injection Via the C# 'new' Keyword

by msimpson 11/19/2009 5:05:00 PM
For the last year or two I've been getting into the principle of Inversion of Control (IoC) as a way of decoupling pieces of my applications.  While I think the name 'Inversion of Control' is a pretty poor choice because you can't really infer much from it, the principle itself is valuable.  In my case, I've been exploring a specific technique for IoC called Dependency Injection (DI).

To understand what DI can do for you, you first need to understand the problem it's trying to solve.  In any reasonably complex piece of object-oriented software, there will be multiple classes organized into layers and/or tiers.  If the programming language is strongly-typed, then ideally, the developers will have formalized the interactions between them using interfaces, which define chunks of abstract functionality.  These interfaces allow calling code to be agnostic of the actual type of the object they're calling, which in turn allows those physical implementations of the called code to be swapped out without the calling code ever being the wiser.

Most OOP developers will agree that the above is a huge step towards decoupling the codebase.  Interfaces are old hat, right?  However as the application grows, another type of dependency manifests itself - construction logic.  The calling code needs to obtain an object that implements the interface it's interested in.  Now the calling code is not supposed to care what type of object that actually is, but if it's going to create it, it's going to have to know, since an interface cannot be instantiated directly.  And poof! - the glorious agnosticism we thought our interface was granting us has disappeared.

Now extrapolate a little further, and imagine that the object we're trying to create has its own dependencies, which we need to pass into it when we create it.  And imagine that those classes also have their own dependencies, and ad infinitum.  Suddenly our poor calling code, which really only wants to call IFoo.Bar(), now has to know how to create the entire universe before it can do anything.  Now imagine you've got a dozen classes that need to call IFoo.Bar() - that means a dozen classes that need to know how to create the universe.  If you decide to change the constructor for the MyFoo class you're calling, you need to visit each of those dozen calling classes to update their construction logic.  And what if you want to change from MyFoo to YourFoo?  Same thing. Refactoring tools are getting better all the time, but they're not that good yet, and may never be.

The lesson?  Construction logic is just as much of a dependency as a hard reference to a concrete type, and we need to externalize it.  

The traditional way to do that is through the use of factory classes.  These are classes designed expressly to create new instances of objects (probably objects implementing a particular interface), and returning them to the caller.  Factories do help; but they don't go far enough.  Chances are your application will need many different factories, say one for each interface.  Construction logic will be sprinkled across all your factory classes.  The calling code needs to know how to access each one, and they may differ in how they're called and where they live.  And you have to write the factories.

Enter Dependency Injection.  This is typically implemented as a library that provides the following:

  • A simple interface for getting instances of types based on certain criteria;
  • A way to easily configure what types get returned for what criteria;
  • A 'super-factory' to create objects and return them;
  • A way to specify to the factory what dependencies should be satisfied, and how, before returning newly-created objects.

A DI engine is typically called a 'container', but I called it a 'super-factory' above because it does what a normal factory does, but provides a ton of extra features.  If configured properly, it can not only instantiate MyFoo, but when it does, it will satisfy MyFoo's dependencies, and those classes' dependencies, and so on, all before returning your MyFoo to you.  You can configure switches to change out whole sets of types, for example 'release', 'debug' and 'unit test' (where you might want to substitute a mock data access layer instead of the real one).  It can call a method of yours to create objects, if you want control over how the objects are created but want to centralize that logic.  It can decide on-the-fly between multiple available implementations of an interface, based on context available at runtime.  It can manage singletons for you, without your having to write your classes as singletons.  And you don't have to write the factory, just configure it.

There are many dependency injection libraries available, and they're all pretty much equivalent in terms of functionality, but they differ in how they're configured.  I've been using Ninject, which has cool cartoon ninjas on its web site, but more importantly uses a type-safe 'fluent' interface for configuration.  Some other libraries use XML configuration files instead.  Take your pick.

With Ninject, assuming I have a reference to the Ninject kernel, if I want to get an object that implements IFoo, I can do this:

    var foo = kernel.Get<IFoo>();
    
Ninject sees that I want an IFoo, looks at its configuration and sees that the IFoo is bound to the MyFoo class, creates an instance of MyFoo and returns it to me.  My class has taken a dependency on Ninject, but successfully avoided any dependency on MyFoo, MyFoo's dependencies, or any other concrete class Ninject is configured to provide.  Slick!

In case you're interested, the binding configuration is like so:

    Bind<IFoo>().To<MyFoo>();
    
This code is normally called once, when the app starts up.  This is a very simple example.

This is all well and good, but looking back at the call to IKernel.Get<T>(), I can see room for improvement.  IKernel.Get<T>() is not as clean as new.  How do you enforce that developers don't use the new keyword and create objects directly?  And what if you have a legacy application that doesn't use a DI container?

It seems to me that dependency injection is something that ought to be present in the language itself.  Ideally, I'd like to co-opt the new keyword and make it do dependency injection.  This would have profound implications for the language as a whole.  It would need to implemented in the compiler, affecting type inference, and it might prevent the compiler from making certain optimizations and type safety checks.  I admit I have not thought through all the possible implications, and I don't really have the expertise to do so, not being on the C# compiler team.

But here's how I imagine it:

  • Microsoft would provide a syntax and API for obtaining objects, equivalent to the Get<T>() method above and its siblings.  I imagine both a 'language-integrated' syntax based on new, and a separate programmatic API, so you could use either.  
  • Microsoft would provide a DI implementation, based on the provider model, that by default simply instantiates the requested type and returns it.  DI providers could be implemented by third parties and hooked in via configuration.
  • By default, the new keyword would do what it does now - simply instantiate an object, without using DI at all, so as not to affect performance or risk differences in behavior.
  • However through a switch, the compiler could be instructed to 'turn on DI'.  When the app is recompiled and run, every use of the new keyword results in a call to the currently-configured DI provider.
  • Microsoft would release updates to legacy .NET framework versions to provide the above capability (requiring recompilation to take advantage of it).

Imagine the advantages of such a change:

  • We would be able to make huge changes to legacy apps, without having to wade through the code, needing only a recompile and some configuration;
  • The syntax for getting objects from a DI container could be consolidated with that for manual/traditional construction based on the new keyword.  This would clean things up compared to all current DI implementations that I know of.
  • The learning curve for new DI frameworks would be drastically reduced, since they would all support the same interface for getting new objects.  Of course their configuration APIs would still differ.
  • We would be able to instantiate interfaces, like so:
    var foo = new IFoo();
  • As I mentioned above, when you use a DI container, you pretty much need a dependency on it across the board - and you will have a very hard time switching to a different DI container in the future.  This would help eliminate that problem, especially if you can stick with new and avoid using the API for obtaining objects.
I imagine someone else has thought of this already, and I'd like to hear from anyone who knows more about the idea.

Currently rated 5.0 by 1 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , , , , ,

Software

C-Style Formatting

by msimpson 10/29/2009 4:54:00 AM
I figure what the world definitely needs is more discussion about whether, in C-style programming languages, to put braces on the same line or on the next line.  Or maybe I just don't have anything better to do today.  Regardless, without further ado let me describe my preferred formatting standard:
 
  1. Use a monospace font.
  2. Use tabs instead of spaces, with a tab size of 4.
  3. Separate methods, properties, functions, etc. with one blank line.
  4. Put opening braces on the same line as the construct, separated from it by one space.
  5. Put each closing brace on its own line.
  6. Always include braces for statements that potentially define code blocks, even if the code block is only one line.
  7. Place one blank line before a statement that opens a code block, except when the previous statement opens the containing code block.
  8. Place one blank line after the close of a code block, except when the following line closes the containing code block.
  9. Put one space between comment-declaration symbols and the actual comments.
  10. Put one space between for, if, and similar keywords and the parenthesized expressions that follow them.
  11. In a for expression, put one space after each semicolon.
Here's an admittedly contrived example:
 
using System;
 
namespace Simpson {
  public class Foo {
  public static void Main(string[] args) {
  var x = new Random().Next(3);

  if (x == 0) {
  Console.WriteLine("Foo!");
  } else {
  if (x == 1) {
Console.WriteLine("Baz!");
  } else if (x == 2) {
  Console.WriteLine("Bok!");
  } else {
  Console.WriteLine("Bug!");

  if (args.Length > 1) {
  Console.WriteLine("Arguments:");

  foreach (var arg in args) {
  Console.WriteLine("\t" + arg);
  }
  }
  }
  }
  }
  }
 
The biggest battle seems to be over placement of curly braces.  Here’s my justification for doing it the way I do:
  • Logically, the opening curly brace is not a statement.  It’s part of the statement that opens the code block, and shouldn’t go on a separate line by itself. 
  • Since I never put multiple statements on one line, there’s never any code to the right of a curly brace.  Therefore, putting it at the end of a line does not displace any other code.  By contrast, putting a brace on the next line pushes all the code after it down by a whole line.  Therefore, you will fit more code on the screen at once if you put braces on the same line.
  • Fitting more code on the screen at once makes it easier to see blank lines at a glance, which makes them more effective as tools for visually grouping logically-related code.
  • Putting braces on the same line makes the code wider and shorter, whereas putting them on the next line makes it narrower and taller.  The former style fits better on new widescreen monitors.
  • Code with lots of braces can get very tall and sparse if you consistently put braces on the next line.  So in these cases, sometimes people will break their own rules and put the braces on the same line.  It seems easier to me to be consistent if you always put braces on the same line, rather than do the opposite and be tempted to break the rule now and then.
  • Fitting more code on the screen at once can allow for a larger font size when you ‘zoom out’ to show the whole file (assuming your editor can do this).
  • Finally - in Javascript, putting opening braces on the next line can lead to errors, due to Javascript’s inference of statement termination from carriage returns.  For example, this:
return
{
    age: 12
}
 
produces different results than this:
 
return {
    age: 12
}
 
In a function, the first one returns ‘undefined’ (probably not the intended result), whereas the second one returns an object with an ‘age’ property set to 12.  In Javascript, it’s good defensive coding to always use semicolons to terminate lines, and to always put opening braces on the same line as the construct.  And you’ll benefit by adopting the same standard for all C-style languages – making it a habit will reduce the mental cycles you’ll have to spend thinking about it.
 
As far as I can tell, the only reason people put a curly brace on the next line is to align it with the closing brace, so as to save a few brain cycles when reading the code.  But the statement that actually opens the code block, and “owns” the opening brace, is still aligned with the closing brace.  All it takes is a little practice to retrain your brain to visually associate code (the opening statement) with the closing brace, and I maintain that the benefits of the same-line style far outweigh the effort needed to retrain yourself.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , , , ,

Software

My Ideas For Health Care Reform

by msimpson 9/30/2009 3:52:00 AM
A couple of months ago my younger son, Nolan, broke his arm at Super Hero Camp at the YMCA, because he apparently couldn't actually fly.  The YMCA called my wife, who called our pediatrician, who referred her to the emergency room at a local hospital.  When they arrived, Nolan was given a Tylenol and sat on a hospital bed for four hours or so, until a doctor arrived.  The doctor spent five minutes or so setting the bone, and then it took another half hour or so to get a cast on it.

And what was our out-of-pocket bill for a simple broken arm, with no surgery or other complications, involving only an hour or so of a doctor's time in total?  Over $3,000.  This was reduced somewhat from the off-the-street rate because my health plan had a negotiated rate with the hospital, but is still pretty outrageous in my opinion.

In our case, there were some things that could have been done differently.  Our pediatrician could have referred us directly to the orthopaedist who set the bone and put the cast on it, rather than sending us to the emergency room (the most expensive place to go).  Nolan could have sat in the lobby rather than on a hospital bed until he was treated; hospitals charge big bucks for ER beds by the hour.

But to me, the whole episode shows that our current health care system is horribly broken.  I don't consider myself a free-market evangelist, and I think this example shows how the free market can fail if left alone, but I also think that our system is not actually working as a market should.  Here are the problems I see:

First, the stakes for patients are sometimes very high - life and death.  In economic terms, you could say that the demand is very inelastic; when people need treatment, they often can't choose not to get it.  This has the effect of raising prices; in a nutshell, providers have patients over a barrel, and they don't stop to think how much things cost.

Second, stakes are high for providers as well.  They deal with large numbers of patients, some of whom may act irrationally due to their own high stakes.  Some patients will sue a doctor whether the doctor's care was good or not.  Providers must purchase malpractice insurance, and because the patients' stakes are high, judgments are high, and therefore the insurance can be expensive.  Doctors do whatever they can to avoid a chance of misdiagnosis or other mistakes, and err on the safe side by ordering lots of tests, some of which may not really be needed.

Third, medical technology, fueled by inelastic demand, is improving at a fast pace.  New technology is always expensive.  While in some cases this technology can make things more efficient, I think in most cases it takes significant time (years) to reach a break-even point.  I think providers buy these tools partly for bragging rights, and partly in order to treat conditions that would otherwise be untreatable.  Patients with those conditions stay alive longer, which is a good thing, but this results in higher costs overall because these patients need more care.

Fourth, providers are often required to treat uninsured patients for free.  These costs must be passed on to other patients.

All the factors above conspire to raise the prices that providers charge for their services.

Fifth, providers and insurers have a huge information advantage over patients.  Only providers know what treatment is actually needed and what is optional (although as mentioned above, they try to err on the safe side).  Only providers know how much they charge for procedures, because they don't publish a price list.  Only insurers really know what's covered and what's not; coverage is big and complicated, so knowing what's in and what's out is hard, and even if something is covered, insurers don't publish a price (reimbursement) list either, so you don't know what they're actually going to pay.  This information gap translates into higher costs for patients.

With all the above in mind, I propose a few rules that might help:
  1. Require providers to publish a price list, in hard copy on the premises (on the wall, for example), as well as online, for the procedures they perform.
  2. Require insurers to publish reimbursements for common procedures, in the same way.
  3. Require providers to get patient approval before ordering any procedure (except in life-or-death situations where the patient is incapacitated of course).  The doctor must review the procedure with the patient, and the review must include the price charged, the amount to be reimbursed by the insurer, and the resulting out-of-pocket cost.
  4. Require insurers to work with all providers, and vice versa, for the same price.  The whole "in-network/out-of-network" thing is an unnecessary complication that does not add value to the system as a whole.  This rule would greatly simplify coverage and help reduce the information gap between insurers and patients.
  5. Require everyone living in the U.S. to be covered by an insurance plan.  This idea is part of current proposals.  Although I'm a Democrat, I'm ambivalent about the idea of a 'public option', and I feel that President Obama is right when he says that the public option is not the only way to achieve the goal.  But it's in everyone's interest that everyone be covered.
  6. Because prices and reimbursements are fixed, the patient must pay any shortfall between the price and reimbursement if the reimbursement does not cover it.  Conversely, the patient pockets any reimbursement over what the provider charges (I wouldn't expect the latter situation to occur very often, if at all, though).  This gives the patient a direct incentive to shop around, which in turn gives providers an incentive to keep prices down and insurers an incentive to keep reimbursements up.

I envision a web site where the price and reimbursement lists for all providers and insurers are centralized.  Patients would be able to go to the site, cross-index a provider and insurance plan, and see exactly what they would pay for any given procedure or set of procedures.  Online tools could allow easy comparison of providers and insurers, similar to existing e-commerce sites.

While there are other proposals that may also have merit, the ones above are designed specifically to address the functioning of the health care system as a market.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , , , , ,

General | Society

Trust and Agility

by msimpson 5/29/2009 5:57:00 AM

One adage that I've kept in mind for a few years is "trust makes the world go 'round".  By this I mean that there are many, many cases where mutual trust allows things to go much smoother and quicker, whereas a lack of trust creates extra work.  Trust greases the wheels, so to speak, while a lack of trust throws a wrench into them. 

There are many examples to which one can point: the old security-vs.-convenience tradeoff being probably the most obvious one.  A password prompt is an annoyance and slows you down, but is a necessity when a system can't automatically trust someone connecting to it.  In a much larger scale, you could say that our system of laws (and lawyers!) is an artifact of our lack of trust of each other.

But I thought I'd point out a more specific example.  Last fall a coworker and I took a class called "Agile Bootcamp: A Hybrid Scrum Approach" taught by Steve Davis.  The class itself was excellent, and I learned a lot about agile practices and ideals.

One of the goals of agile development is to streamline the development process.  Typically this involves reducing the amount of documentation and specification that is done.  Contrary to popular belief, agile proponents are not necessarily against documentation, but they are against unnecessary documentation.  In particular they are against documentation done too far in advance of implementation, because requirements change.

But there are multiple purposes for this documentation.  A design document not only tells a developer what to build, but it also formalizes to the product owner what they're getting.  This formalization is important in an environment where the product owner and the development team don't fully trust each other, such as exists when development of an application is outsourced to another company. 

I think that this aspect of the relationship between customer and development team is often overlooked by agile proponents.  But ironically, a large part of agile practices deal with formalizing the relationship more strongly, whether a contract is used or not.  While agile strongly encourages constant communication between the two parties, it also defines how that interaction should take place. 

You could say that the process replaces the contract - but only to a degree.  It doesn't provide the "guarantee" of a dollar figure or fixed set of deliverables.  You could say that that's a good thing, that that kind of guarantee is only illusory anyway.  Regardless, lack of a guarantee is an obstacle to trust, until the development team has proven its worth.

My company, a consulting company, has struggled with this dilemma since we started trying to use agile practices.  Historically, we have proposed a design up front for a fixed price, and accepted the risk of going over budget or past our due date.  This creates strong incentives for us to fully specify the design up front (since we are bound to the fixed price we estimate for it), and to limit changes to the requirements, or at least write in assumptions that they won't change, and charge if they do. 

In such a model, the fixed price satisfies the client, while the fixed set of requirements satisfies the development team.  The client's ideal world would be to pay a fixed price but be able to change what gets built at will, while the development team wants a fixed design but to charge on a time-and-materials basis.

In an agile model, the client pretty much determines priorities and can choose what gets built each iteration.  Detailed design documentation, if done at all, is deferred until right before implementation.  We can't specify a fixed design and price up front.  This makes agile development better suited to time-and-materials billing instead of fixed-price. 

More importantly though, it requires a lot of trust between both parties.  This trust makes agile better suited to internal development than outsourced development, because there's more trust there.  Trust is what allows agile development to be more efficient than waterfall development.

I've often wondered whether it would be possible to use a hybrid contract model, where the contract with the client is fixed price and with a fixed set of deliverables, but the project is run internally (by the consulting company) on an agile basis.  The contract would probably have to have some adjustments, but (key) the assumption of risk would be the same.   The consulting company would still have to estimate what it will take to build the entire feature set, and plan and price accordingly.  The client still gets a guarantee, but at a cost presumably built into the fixed price.  The consulting company gets to gain experience with agile methods, without losing work to its rivals.  You could say that this hybrid model is not truly "agile".  That's true.  But it might be a viable stepping stone to build trust between the two companies.

Currently rated 5.0 by 1 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , , , , ,

Software | General

What the #%@! Is REST?

by msimpson 5/21/2009 2:03:00 PM

I'm currently rewriting this site using ASP.NET MVC, REST, Dojo, and some other technologies.  I've mentioned this to friends and coworkers, and many have asked what's the big deal about REST.  I admit that until one really examines it closely, it's pretty underwhelming.  "HTTP requests?  I've been doing that for years!"  I was in that skeptical camp for a long time, until I bought and read a book a friend of mine recommended.  But now I have a fuller appreciation of what REST is about and what distinguishes it from RPC-style architectures.  There's already a lot of material on the web about REST, but I thought I would describe it in my own words - hence this blog post.

About REST 

REST ("Representational State Transfer") is a term coined by Roy Fielding in his well-known dissertation.  I think one can build a RESTful application without fully understanding what Roy means by the term, but you could say that REST is "how the web is designed to work" - meaning in specific terms, it leverages HTTP.

Now I'll put on my Captain Obvious hat and point out some things about HTTP.  It defines a few key things - resources for example, which are documents, data, and other things that are accessible via the web, and also a very limited set of verbs ("get", "post", "put", "delete", etc.) for operating on those resources.  This is an important point - there are only a few standard verbs, but a potentially infinite set of "nouns" (resources).  REST people speak of HTTP's constrained set of verbs as HTTP's uniform interface.

Now imagine an application with a potentially unlimited set of nouns, PLUS a potentially unlimited set of verbs.  That's an RPC-style application.  RPC-style apps have a much less uniform interface, because they have an infinite set of verbs.

Typically web services based on an RPC model use SOAP, a protocol that involves stuffing all the information necessary for a request (including the verb info) into a data structure, and sending that to the server using some transport mechanism.  SOAP itself doesn't require any particular transport, but almost all implementations use HTTP for transport. 

However, because all the info about the procedure call is in the SOAP "envelope", in effect SOAP rides on top of HTTP without leveraging it for anything but simple transport.  You could say SOAP is a tunneling protocol.

This is innocuous enough, and in fact SOAP does bring some things to the table that are not in HTTP, but for many applications, HTTP would suffice.  HTTP already contains mechanisms for authentication, caching, content expiration, exception indication, addressability, and other useful things.  In some cases (like authentication), SOAP and its extensions reinvent the wheel. 

With that background, I'll define a few more things central to REST.

Resources are the logical noun you're working with; strictly speaking, they don't physically exist.  The resource state lives on the server, and is not exposed directly.  Instead, the client deals with representations of a resource, which exist on a temporary basis.

A representation is simply some subset of the resource's information, in some format meaningful to the recipient.  If your logical resource is a "person", maintained in your server database as a record in a Person table with fields for PersonID, FirstName, LastName, etc., then the person record in the database is not the resource itself - it's the resource state.  The resource is the logical person - the person's soul as opposed to its body, to put it in pseudoreligious terms.   A representation might be a chunk of JSON data sent across the wire listing some subset of that information (or all of it).  Or it might be XML.  Or HTML.  Or plain text.  You can have multiple representations of the same resource.  HTTP and REST do not specify what the format is, just that there is one.

When a client requests data, works with it, updates it, deletes it, etc., there is another kind of state, called application state, which describes where the client is in its process, the current values of form fields, the history of user actions, etc.  With REST, application state may only be maintained on the client, while resource state may only be maintained on the server.  This puts the client in control of where it goes and what it does.  It also allows the server to be what is typically just called stateless - the server forgets about the client in between requests.  This allows the server to be simpler and scale better.

A URI provides addressability for a resource - the ability to reach a unique resource given an address.  Each resource might be exposed via multiple URIs, but each URI points to only one resource.  In the example above, you might have a URI like this:  http://www.myserver.com/person/123, where '123' is the person ID.  You could also have http://www.myserver.com/simpson/mike (if first + last name is unique).  When requesting a representation of the resource, you might specify it in the URI (http://www.myserver.com/person/123/xml), or use the "Accept" HTTP header.

A third principle is connectedness.  This just means that the client isn't required to know information that's out-of-band in order to follow from one resource to another or from one representation to another.  All the information needed is supplied in the representation returned in the current request.  In practical terms, this means doing things like returning URIs of related resources, instead of just their IDs.

Advantages of REST

Here are some advantages and disadvantages of REST, as compared to SOAP/RPC:

  • REST is simple and extremely interoperable.  Everything speaks HTTP.  That makes REST suitable for public-facing applications that might be used by anyone.
  • REST is addressable - a unique URI exists for every resource.  By contrast, SOAP requests are usually funneled through a single URI for the service as a whole, so you can't refer to a particular resource or action directly.  You have to embed that stuff in the SOAP envelope.
  • REST has great performance.  There is no heavyweight XML/XSLT to parse (at least not necessarily).  With no session state, you avoid serialization costs on the server side.  True, you may have more requests, and you may need to push more representation data across the wire, but it's usually no worse than viewstate (which is gone by the way), and HTTP traffic is easily compressed anyway.
  • REST has great scalability.  Stateless servers are easily scaled out across a web farm (as Clay pointed out below, you can also run RPC-style apps statelessly, but it's optional for them).  Plus, you can take advantage of HTTP's extensive capabilities for caching and optimization.
  • REST works better with AJAX.  A REST service can return data in JSON or another format easily parsed by Javascript.  It is impractical for Javascript to parse a SOAP response.
  • REST is more easily testable.  Ignoring "proper" unit testing for the purpose of this discussion, you can simply use a browser to hit the service directly.  You aren't going to do that with a SOAP service.
Disadvantages of REST

Nothing is perfect.  Here is where REST falls down:

  • Although there is a standard of sorts (WADL) for declaring the interface a REST-based site presents, it's not widely used at all.  It's also possible to describe a REST interface fairly well with WSDL 2.0.  But many people disagree that such description is necessary or desirable.  If you feel you do need to be able to describe your REST interface to clients, there is no standard that is likely to work across the board.
  • REST and HTTP by themselves do not address some more complicated scenarios that are specifically addressed by SOAP and its WS-* extensions - things like transactions, for example.  It is possible to model a transaction with REST by defining a resource for the entire transaction state, and handling the multiple parts behind the scenes on the server, but this feels somewhat clunky.  I would generalize this to say that REST presents an "impedance mismatch" when the problem is more naturally expressed in terms of verbs as opposed to nouns.
  • Tooling for SOAP is great these days.  Basically you can point your client at a service, push a button, and start calling the service on the spot.  By contrast, REST's lack of standardization and description means that manual effort will be needed to figure out what the interface looks like and how to call it correctly.  By and large this is easy to do because REST is simple, but it's still work that a SOAP app can simply avoid.
Choosing Between REST and SOAP/RPC 

Myself, I don't think any one approach is right for all situations.  These days, I would tend to use REST by default, unless the requirements are such that they couldn't be easily accomplished with it.  In such case I would go to SOAP with all its bells, whistles, foghorns, blinking lights, etc. etc.

REST and ASP.NET MVC

REST by itself does not say anything about what server technology to use.  In the Microsoft realm, Microsofties might say to use WCF to expose RESTful services.  This is certainly a good option, but ASP.NET MVC is also a good option.  While it's not RESTful by default because the action (the verb) is present in the URI as opposed to being driven by the HTTP request verb, ASP.NET MVC allows flexible routing and is very easily reconfigured to be RESTful.  It also gives you absolute control over the representations you serve to the client.  Bottom line, it's a perfect fit once reconfigured.

My experience working with REST and ASP.NET MVC has been a joy.  Having been used to large WebForms applications with big session state and authentication that depends on it, I've found it refreshing to rebuild my site (which would have blown away session state were I using it) and continue working without even having to log in again.  I can set it up such that I can close the browser, reopen it and keep working without logging in again, too.   It makes for a streamlined, pleasant development cycle.

Guidelines for a RESTful Application

With all that in mind, here are some guidelines I distilled from reading the book mentioned above:

URI Design

  • Make URIs descriptive (one school of thought says make them descriptive, and another says make them opaque; the author favors the former school).
  • One resource can map to one or more URIs, but each URI can map to only one resource.  Use the minimum number of URIs necessary.
  • Version your application or service; consider encoding the version number into the URI:  /v1/users/msimpson.
  • Model operations (such as transactions) that do not fit the standard methods as resources themselves.
  • Use path variables to encode hierarchy: /parent/child.
  • Put punctuation characters in path variables to avoid implying hierarchy where none exists:  /parent/child1;child2.  Use commas when the order of the scoping information is important, and semicolons when the order doesn’t matter.
  • Use query variables to imply inputs into an algorithm, for example: /search?q=wingnut&start=10.

Representations

  • Design your representations, both incoming (request) and outgoing (response).  I would go through an analysis process up front to:
    • designate my resources;
    • design my URI structure;
    • define what verbs are allowed for each URI;
    • define the representations accepted and served by each combination of URI and verb.  It may not be necessary to do it all up front, but there should at least be conventions defined up front.
  • Representations should link to related resources/states.
  • Representations need not convey the entire state of a resource, only some of it.
  • Where an entity-body is needed for a given request, consider exposing a form so that the client will be able to figure out how to make the request.

Resource State

  • When creating a subordinate resource, use PUT when the client is in charge of determining the resource URI, and POST when the server is in charge.
  • Don’t allow clients to PUT representations that change a resource’s state in relative terms.
  • Don’t expose unsafe operations (operations that change resource state) through GET.
  • HEAD, GET, PUT and DELETE should be idempotent (should have no undesired effects if called repeatedly).

Application State

  • Don’t use cookies, even for a simple session ID, unless the client is in charge of the cookie value.
  • Authenticate on every request, rather than maintaining a server session which breaks statelessness.  Authentication can be done using any of several different mechanisms, but credentials should probably not be part of the URI.  Consider HTTP Basic or Digest authentication, or a similar solution involving the Authenticate response header.

HTTP

  • Use the five built-in HTTP methods (HEAD, GET, PUT, POST, DELETE) to indicate what you’re trying to do. 
  • Use conditional GET (response headers Last-Modified and Etag, and request headers If-Modified-Since and If-None-Match).
  • Allow the client to cache data when appropriate, through the use of the Cache-Control response header.
  • Allow the client to make Look-Before-You-Leap (LBYL) requests, using the “Expect: 100-continue” request header.
  • Don’t put error messages in the representation; rather, use HTTP error codes appropriately.
  • Set the Content-Location response header if the URI requested is not the “primary” URI for the resource.

Other

  • Be careful with Ajax, as it can break addressability and statelessness.  But paradoxically, it can actually allow statelessness if used well.  If used, employ an Ajax framework to hide things like browser differences, and include equivalents to browser navigation in the application.

Currently rated 5.0 by 2 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , , , , , , , ,

Software

Real-World Validation

by msimpson 5/12/2009 8:10:00 AM

I'm in the middle of rewriting this site using ASP.NET MVC, and am wrestling with the age-old question of how best to validate my data.  Like every developer, I've addressed this in different ways in the past, and thought I would take some time time to think through the problem systematically and figure out the 'best' solution for my needs.

The first thing to do is examine the problem.  Here are the goals and objectives as I see them:

  • Ensure that all input data is validated before it is persisted;
  • Keep my domain model clean at all times, to the degree possible;
  • Provide enough flexibility to handle complex business logic;
  • Expose validation through the entity model, while avoiding or at least minimizing dependencies on it;
  • Gracefully handle multiple levels or stages of validation without unduly duplicating code;
  • Make it easy to write validation code by providing 'helper' methods for common types of validation;
  • Facilitate a good user experience by making it easy to collect all validation errors and display them.

As I see it, there are various ways in which data validation approaches differ from one another.  The first is when the validation occurs: it can either be done immediately when working with the entity model, or it can be deferred until later and done all at once.  It seems clear that deferred validation is significantly more flexible, but at the cost of removing your ability to rely on the domain model always being valid.

The second characteristic is division of responsibility, in other words who's validating whom. I differentiate between the local approach, where domain objects validate themselves, vs. the remote approach where a validator object validates another object.  The remote approach seems more flexible here, but there are a lot of options that blur the lines - validators that exist as separate objects but are incorporated into entities via composition, for example, a la the Strategy pattern.

The third aspect is support for arbitrary context.  Complex validation often depends on things besides the entity's current internal state; some validation approaches support passing context, and some don't.

As a practical example of context, my team wrote a parser for ACH data that's used both when originating files and receiving files.  The structural model for the data is always the same, but the validation rules for the two scenarios are vastly different - when originating, we are much more strict than when receiving.  Also, the current date is a factor, because rules need to take effect on certain configurable dates, and expire on other dates.  We can't simply use DateTime.Now, because we might want to validate 'as of' a certain date.  We need to pass the date, and the current originate/receive scenario, as context to inform the validation.

When reviewing the approaches above against our goals, it's clear we need to make some tradeoffs.  For example, 'immediate local' validation (via property setters for example) ensures that input is validated, and helps keep the domain model clean at all times.  DDD purists might favor this approach.

But continuing with the above example, we don't have any control over when the validation runs, and if properties are used (instead of methods), then we can't pass context unless we set some reference to it ahead of time.  Immediate validation can also make it hard to collect validation errors and display them to the user; if an exception occurs, do we stop in our tracks and display the error, or keep going?  The calling code has to create and manage the exception collection.  Any and all code that sets properties on an entity needs to be prepared to handle validation exceptions and deal with them appropriately.

Earlier I mentioned different levels or stages of validation.  Not all validation occurs within the domain entities.  For example, when one is creating a new user account, there may be an email and/or password confirmation field that is not reflected in the entity.  This type of thing needs to be validated independently of the normal User entity validation. 

And there may be larger-scale validation that occurs between or across entities, and not just within them.  In the ACH parser example above, the file has an internal hierarchy of records.  Each record in the file becomes an entity with its own internal validation, but there are control records in the file that contain counts and totals of the previous records.  In this particular case, since it's a single-parent hierarchy, it's easy enough to decide where this cross-entity validation should go; in DDD terms, the file might be designated an 'aggregate root' that is responsible for everything it contains.  But one can easily imagine validation that might need to occur across nominally unrelated entities, due to their participating in the same business process.

So what's the best general approach?  In Scott Guthrie's Nerd Dinner tutorial, he uses what I would call 'deferred local' validation.  He extends his LINQ-to-SQL entity objects via partial classes, and adds support for deferred validation with a custom API.  This approach works well enough, but I feel that partial classes are too tight a coupling to the entity objects, and don't address cross-entity and extra-entity validation.

I've chosen to go with deferred remote validation.  Remote validation keeps me from getting too intimate with my Entity Framework entities, and deferred validation is simply more flexible.  I've decided to live with my domain model being suspect until I validate it.  In some cases it's actually desirable to allow your domain model to become invalid.  Going back (again) to the ACH example, as I mentioned, when we receive files we are pretty loose about validation rules.  We want to go ahead and receive the file but note the problems we found with it.  If we allow the model to exist in an invalid state, we can use it as input to a reporting mechanism, for example.

Anyway, I've defined two interfaces, ILocalValidator and IRemoteValidator, that contain two methods each:  IsValid() and Validate().  Each of these methods takes a context object (currently declared simply as 'object' though I may define this more tightly later).  IRemoteValidator also takes a reference to the object to be validated, whereas ILocalValidator does not; this reflects the semantic difference between them.  ILocalValidator also includes a method that allows setting a reference to the object to be validated.

Implementing these is a class named AbstractValidator<T> that contains a number of helpful methods for common validation tasks.  T is constrained to be an EntityObject, in my case.  To create a validator, all one needs to do is create a class that inherits from AbstractValidator and further constrains T to be a specific type of entity object, and override one method to perform the validation.

For convenience, I have another class called EntityObjectExtensions that hooks the above into the entity objects via extension methods, allowing one to simply say, myEntity.Validate().  It uses a naming convention to look up the appropriate validator class and instantiate it.  Of course it would be easy enough to hook into the validator in other ways, for example by using an IoC container to instantiate the appropriate validator and install it in the entity during construction.

Per DDD, I use the Repository pattern for persistence (I implemented this layer a while back to hide EF).  To ensure that only valid objects are persisted, I call validation from my repository classes before I persist them.

This approach avoids touching my entity objects at all, allows me to choose when validation is performed, allows me to pass arbitrary context to my validation routines, and provides reasonable assurance that my domain entities are valid before they are saved to the database.  It also provides both local and remote validation using the same validation code.  Finally, the approach does not preclude cross-entity and extra-entity validation - in fact, in the case above where I have a 'repeat password' field, I validate that in my controller immediately before calling user.Validate().  These are encapsulated in a method in my controller that I call from the Create() and Update() action methods.

A colleague of mine has proposed implementing a 'View Model' (a la Martin Fowler) pattern to further isolate and validate model data going to and coming from the UI.  I have not implemented that (yet), but don't see that it would be difficult or cause any problems.  In fact, if carefully written, the same validator might be reused to validate both the view model and the domain model.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , , , , , , , ,

Software

ASP.NET MVC + Entity Framework + IIS 7.0 + DI + REST + Ajax + Dojo = SlipStream 2

by msimpson 5/2/2008 10:18:00 AM

For the past few weeks I've been experimenting with a number of new and not-so-new technologies that seem to combine very well.  The technologies include:

  • ASP.NET MVC:  This is a new framework from Scott Guthrie's team at Microsoft that provides a real model-view-controller implementation for ASP.NET.  The framework includes a robust URL routing engine that is flexible enough to do real RESTful routing, and can generate URLs from the routes (allowing you to change your routing later without breaking your app).  You don't need to use MVC to use the routing engine either.
  • IIS 7.0: The new version of IIS is lean and modular, and supports integration with ASP.NET applications.  This allows individual apps to plug into the processing pipeline and completely take over processing from IIS.  For me, this allows custom authentication plus clean RESTful URLs with no file extensions.
  • ADO.NET Entity Framework:  This is the new ORM tool from Microsoft.  It can reverse-engineer your database and generate a data model with mapping between the physical database, logical data model and entity objects.  I and a colleague of mine have written a handful of serious ORM frameworks, and although so far I've really only kicked EF's tires, it seems to do everything I need.  A key feature for me is that it supports querying using LINQ to SQL (among other methods).
  • DI: Dependency Injection is a technique for decoupling the components of an application to enable future extensibility and better testing.  I haven't started implementing this yet in this app, but I plan to very soon.  A couple of frameworks I'm looking at are Unity and NInject.
  • REST: Not so much a technology as a mindset, REST has intrigued me ever since I read RESTful Web Services a few months ago.  Microsoft tools haven't supported it well at all - until now.  I thought this was a good time to explore an alternative (OK, "non-Microsoft") way to design and build web applications.
  • Ajax:  I've used Ajax in web apps for several years now, but only for specific, narrowly-defined purposes.  I wanted to build an app that leveraged it to the hilt, and an app based on RESTful design seemed to present the best opportunity to do so.
  • Dojo:  This is a client-side Javascript framework that provides a cornucopia of capabilities including Ajax support, widgets, fancy UI effects, event wiring, etc. etc.  In many cases it can eliminate the need for server-side controls.
  • CSS:  OK, this is not new, and I've used CSS for years; but I wanted to create an app that leverages CSS for all style information, and avoids cluttering the document with it.  In particular, I wanted to avoid using tables for layout, except for tabular data.

For my guinea pig, I chose to reimplement the web application that supports my audio product for Second Life, SlipStream.  The new version will be completely web-driven, eliminating almost all notecard configuration within the device in-world.  The system must do the following:

  • Allow management of customers, plus their devices, playlists, managers, etc. from both a browser-based web UI and scripts in Second Life (SL);
  • Allow indirect but real-time searching of the Shoutcast directory;
  • Provide configuration and stream information to device scripts in SL;
  • Track the current state of devices in SL, based on the calls they make to the application.

The requirement to support both SL script clients and web browsers causes some issues.  First, the scripting language in Second Life, LSL, is pretty limited when it comes to making HTTP calls, despite being compiled to and run on Mono (this feature is currently in an almost-production stage).  You can't, for example, set custom request headers or check the response headers, and there's no facility for parsing JSON or XML data returned by the call.  Plus, the size of data that can be sent and received is severely restricted, to 255 bytes per call last time I checked.  And HTTP requests are throttled to 100 per 100 seconds per object in SL.

I'm dealing with most of this by carefully designing the resources I expose and the representation formats I use when talking to a script client in SL.  But another wrinkle is authentication - SL scripts will need to authenticate on every call by passing the appropriate query string parameters.  Browser clients, however, are asked to log in only when they try to access a restricted resource, and they will get an authentication token they can send back on subsequent requests.  The authentication scheme is pretty much completely custom as a result.  It consists of an HTTPModule that checks for the necessary credentials, and establishes a custom security principal and identity for the current request. 

Authorization is done pretty much the normal way.  The app only has three classes of users:  anonymous users, authenticated normal users, and administrators.  For simple, "static" authorization, the controller action methods use a PrincipalPermission attribute to check membership in a role.  However, this type of checking is rarely enough.  Typically I would provide another layer of mapping for static authorization: instead of Users <-> Roles, I would have Users <-> Groups <-> Permissions.  This can be done using the .NET administrative tools, but most people aren't aware of that and don't use them.  Anyway, for this app, the simple role-based authorization suffices for static authorization.

But many apps also have what I call "dynamic" authorization - that is, authorization that is dependent on current application state, typically the resource that the user is trying to access.  An example of this would be a system where user Fred can only access documents he created, but not other people's documents.  In this case, a simple role check won't cut it - you need to find out who authored the document being requested, and then see if that matches the current user.  Anyway, this application has some scenarios like that, and so I have some helper code built into my controllers to do this type of checking.

Just for grins, I also want to support conditional HTTP GET on the resources and actions where it's appropriate (which as it turns out, aren't that many, but hey).  So, I wrote some code in a base class for my views that tacks on the appropriate response headers based on the timestamp of the current resource from my database, and examines incoming headers to see if it can optimize the response body away. 

The default routes provided in the MVC templates from Microsoft are not (IMHO) completely RESTful - they include the action in the URL:  for example /{controller}/{action}.  I felt that the action should be expressed by the HTTP method used to make the request, so I reworked the routes accordingly:  /{controller}/{id}/{format}, for example. 

The application is completely stateless; that is, it does not use session state at all.  This allows it to keep working through a restart - the user's authentication token will be accepted on the next request after a restart, and the user never knows the difference.  The user cannot fake an authentication token because it's encrypted on the server using the server's private key;

On the client side, the app uses CSS for all styling except tabular data, including form layout.  All the forms submit via Ajax, eliminating postbacks and allowing things like password fields to retain their values when you submit.  I'm using Dojo for the Ajax calls and for a few other things here and there... I'll be looking at expanding to use it for all my form widgets, but whether I stick to that approach will depend on whether the resulting code is XHTML- and CSS-compliant.

So far, I have the basics of the application working:  authentication, authorization, fetching data, supplying forms, handling form submissions, etc.  I need to do some more styling and finish building out the rest of the features, but it's starting to come together.  When I get the app done, then it will be time to build out the device scripts in LSL, package the product and start selling! 

Currently rated 5.0 by 1 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , , , , , , , , ,

Software | Second Life

ASP.NET State Hell

by msimpson 2/18/2008 5:48:00 PM

Let me begin by saying that despite ASP.NET's strengths, its state management leaves some things to be desired.  If you've read some of my other blog posts you might not be surprised to hear me say that... I'm known in my office for being an old codger (at 37) who complains about lots of things.

The first issue is that the state management mechanisms in ASP.NET mix the concepts of scope and lifetime.  It's true that there is a loose correlation between the two; for example, Application state is scoped to all users of a given application, and lasts for the duration of that application.  Session state is accessible to the session, and lasts for the duration of the session.  The third option, ViewState, is scoped roughly to the session (actually more narrowly), and lasts across multiple requests for a single form.  You could say it's scoped to the form, but I think it's useful to define it based on lifetime.

Now examine the statements above.  The phrases "application state" and "session state" are pretty intuitive... but "view state"?  That should have been named "form state", except for the inconvenience of wanting it for individual controls as well as the form.  The distinction between lifetime and scope is never made, perhaps because with only these three options, the distinction is somewhat moot.

But once you're aware of the distinction, other possibilities present themselves.  For example, why can't we have state that persists for the duration of a "business process" (across multiple forms), instead of for the lifetime of a session?  What about state scoped to multiple users - all administrators, for example?  Or state with a lifetime spanning multiple sessions?  All of these would be useful for certain scenarios.

For anything but the three supported options, you're on your own.  Web developers have gotten good at using all the various techniques at their disposal - for instance using cookies to preserve state across sessions, using page-level variables to maintain state during the processing of a single page, etc.  But basically, once you need more than what the three main state facilities give you, you're on your own.

So there's a learning curve issue here in that state is not accessed through a consistent mechanism - new developers must learn not only how to use Application, Session, and ViewState, but cookies, Cache, page variables, hidden HTML fields, HTTP context, thread-local storage, etc. etc. 

The problem is worsened by the fact that each of these mechanisms has implications relating to the physical implementation, so the developer has to worry about those too.  For example, using the logical concept of "state scoped to the form with the lifetime of the form" means using ViewState, which means sending data to the client and back on each request.  With the exception of Session, ASP.NET doesn't really give you any options when it comes to the physical implementation.

As if all that weren't enough, in a multi-tiered application, maintaining state should not be be the presentation layer's concern.  When the presentation layer maintains state, one of two things will generally happen:  either the presentation layer will need to pass state to the business layer, or the business layer will need to know more than it should about the presentation layer.  Both are bad.

Finally, none of the state mechanisms in ASP.NET really offer immutability as a way to address multi-threading issues.  This might or might not be considered a fault of the platform, but if it existed and were transparent it would be handy.

In summary, I think there are a number of problems with ASP.NET state management:

  • confusion of scope and lifetime;
  • limited state options;
  • management of state through many independent, inconsistent mechanisms;
  • dependence on the physical implementation of each mechanism;
  • blurring of lines between the presentation layer and business layer;
  • lack of immutability.

What can we do about it?  I propose a state manager that attempts to do the following:

  • manage state based on various combinations of scope and lifetime, providing options beyond what ASP.NET and related facilities offer;
  • expose state capabilities through a single consistent interface;
  • hide the underlying physical implementation of each type of state, and allow use of different implementations where appropriate;
  • explicitly support the business layer, and maintain a clean separation between the business layer and presentation layer.

As I envision it, this state manager would be a class in the business layer, with a limited set of interfaces for supporting "state providers" that would manage the actual state data.  The class would enforce serializability of state data; although this is not required for the underlying implementations of all state providers, requiring it up front preserves the ability to switch implementations.  All access to state data would occur through the class, probably through static methods.  The specification of supported state options, and the mapping of providers to those options, would be done through configuration. 

The state manager would (ideally) support notions of scope based on the following:

  • Application: accessible by all users of an application;
  • Role: specific to all users with a certain role;
  • User: specific to an individual user (note that this is NOT the same as session);
  • Session*: specific to all requests in a given session.  This has lifetime implications because the session is only valid for a certain length of time;
  • Business Process*: specific to requests associated with a given logical business process.  This also may have lifetime implications due to the finite length of the business process, but at any rate it's separate from session scope.  Consider multiple sessions associated with the same business process, for example;
  • Form*: specific to the processing of a given form, from initial entry (IsPostBack=false) to the user leaving the form.  Equivalent to ViewState;
  • Request*: specific to the processing of a given request.

*Has lifetime implications.

The state manager would support similar options with regard to lifetime:

  • Application: lasts the duration of the application;
  • Session: lasts for the duration of the current session;
  • Business Process: lasts for the duration of a logical business process, which may or may not span multiple sessions;
  • Form: lasts for the duration of processing of a given form, from initial entry (IsPostBack=false) to the user leaving the form.  Equivalent to ViewState;
  • Request: lasts for the duration of processing of a given request.

It might be useful to put the scope vs. lifetime options above on a grid.  Some combinations might be unworkable, but I think a surprising number of them would be useful.

To limit the knowledge the business tier would have of the presentation tier, an inversion-of-control pattern could be used.  Interfaces could expose state capabilities available in the presentation layer, without requiring a direct reference to the System.Web assembly.  The presentation layer would pass a reference to a class (or classes) in that layer that implement the appropriate interfaces, allowing the state manager to use it to take advantage of things like ViewState.

A key goal here is divorce the logical concepts of state from their physical implementations.  The underlying implementations will always impose constraints - for example, storing session state in the database will make it difficult to know when the session ends.  But when enough providers are implemented options will become available.  In time, for example, I might be able to reconfigure the app to store form state (ViewState) on the server rather than sending it to the browser, without having to touch a line of code.

Sure would be nice :-)

Currently rated 5.0 by 3 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , , , , , , ,

Software

Are Foreign Keys Bad? You Decide!

by msimpson 11/14/2007 6:18:00 PM

I'm going to throw out a somewhat radical, even heretical, idea: that foreign key constraints, as traditionally implemented, might not be worth it.  What?!? you say.  Foreign keys are essential to your data integrity?  They're worth the minor performance impact, right?  They're also worth the hassle of dealing with them?  OK, fine.  But read on.

Foreign key constraints have other less obvious drawbacks.  Consider this: the foreign keys in your database probably make it difficult or impossible for you to prevent deadlocks.  Let's say tables A and B are related by a foreign key (B -> A).  Now let's assume Alice wants to (within a transaction) insert records in both tables, and Bob wants to delete records from both tables.  Because of the foreign keys, Alice must access the tables in this order: A, B; while Bob must access them in this order:  B, A.  Whenever you have multiple clients accessing more than one resource in a different order, you have a recipe for deadlock.  Alice locks data in table A, Bob locks data in table B, and then they wait on each other.  Forever, in principle.

Now a supposedly smart database like SQL Server will detect this deadlock and shut down one of the transactions ("You have been chosen as the deadlock victim.  Buh-bye.").  But that's not very nice, is it?  Now the caller has to handle that scenario and retry.... ugh.   It's also possible to reduce the likelihood of a deadlock by partitioning data, using finer-grained locking, shortening transaction times through various means, and/or specifying loose transaction isolation.

One very good idea is to access your tables in a consistent order - if you can pull it off, this will completely prevent deadlocks, though you will still get some temporary blocking.  This is almost always worth it for a high-volume application.  The presence of foreign key constraints, however, will prevent it from being bulletproof if you must support both adds and deletes, for the reasons mentioned above; adds go top-down, but deletes go bottom-up, which means you're not accessing things in a consistent order any more.

So what to do?  Besides the ideas above, you could consider removing the foreign key constraints from your database, if you're willing to write (and test) the significant application code required to ensure that your data integrity is bulletproof.  A side benefit of this is slightly better performance.  To date I have not found it necessary to go to this extreme.  But here's what I would like (Microsoft, are you listening?): I want constraints that are not checked until my transaction is committed.  This idea is not new; in fact Oracle has had it for quite a while, and Microsofties have been discussing it for a a while.  Here's one blog entry that talks about pros and cons of the idea.

Currently rated 4.0 by 1 people

  • Currently 4/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , , ,

Software

Unit Testing Thoughts

by msimpson 11/13/2007 6:03:00 PM

I had an interesting discussion with a former colleague, Clay Lenhart, where he said he's currently disenchanted with test-driven development because of the need to constantly update the unit tests as the application code changes.  This struck me as a valid point; TDD, viewed in that light, would seem to better suit a traditional waterfall development process than an agile one.  Agile processes supposedly embrace the change that will inevitably occur in requirements, design, etc., and try to mitigate its impact by (among other tactics) reducing the effort involved with refactoring.  Writing the test first seems to assume that you can get your requirements right up front - the hallmark of your father's development process, right?

The more I think about this, the more I think there's an appropriate time to write unit tests, and it can be discovered by looking at code churn.  When the churn in a module falls to a certain value, the code is stable enough to test, and only then do you write the tests and automate them.  This allows you to operate at maximum speed, and helps keep you from avoiding refactoring, early on when the code is being ripped apart and put back together wholesale; it also preserves the benefits of unit testing later on in the project. 

Broadly speaking, I would relate this approach to my general approach to decision-making, which is to defer decisions until I feel that either 1) I have to make a decision to proceed, regardless of the information available, or 2) little or no more useful information is forthcoming.  In an even broader sense, the point is that there's a tradeoff between doing things sooner vs. later, and you have to find the point on the timeline that yields the most benefit.  Deep Thoughts By Mike ;-)

Speaking of unit testing, I'm thinking of developing an API and tool that will generate test data based on policies specified by the developer, and in general facilitate unit test development.  Here are some ideas:

Test Data Generation

  • Provide scheme for specifying generated data formats
    • Minimum length (character data)
    • Maximum length (character data)
    • Minimum value (numeric data)
    • Maximum value (numeric data)
    • Hardcoded portions
    • Random portions
      • Within limits
    • Data types (string, numeric, strings of only certain characters, etc
    • Values from another table
      • Generate a dependency in the test data generation, so that if we’re generating data for the remote table, we do that first
    • Manual specification of test data by the developer
  • Specify test data generation policies for each desired table and column in the database
    • Number of rows desired
    • Whether to violate data generation policy
    • Provide reasonable defaults so not everything needs to be specified
  • Allow scheduled, programmatic or ad-hoc test data generation
  • Allow generation of test data without stepping on existing (presumably user-entered) data

Unit Test Support API

  • Clear table
  • Generate test data in accordance with specified policies
  • Assert existence of specified rows
  • Assert existence of a specified number of specified rows
  • Assert non-existence of specified rows
  • Delete specified rows
  • Delete test data generated as part of this test
  • Support notification upon success and/or failure (does NUnit do this already?)
  • Support generation of a report with test results (does NUnit do this already?
  • Clean up

Unit Test Generation?

  • Template-driven
  • Mocking (probably skip this in favor of third-party framework or built-in VS support)

Anyway, the first step is to research what's already out there.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , , , , , ,

Software

Powered by BlogEngine.NET 1.2.0.0
Theme by Mads Kristensen


Calendar

<<  March 2010  >>
MoTuWeThFrSaSu
22232425262728
1234567
891011121314
15161718192021
22232425262728
2930311234

View posts in large calendar

Pages

    Recent posts

    Recent comments

    Disclaimer

    The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

    © Copyright 2010

    Sign in