Saturday, April 23, 2011

Integrating NoSQL into your SQL application

There are basically 4 ways you can integrate the use of NoSQL in your SQL application.

1) Just stick to SQL.
The issue here is that there is likely something you need to do with your data that your SQL database is not efficient with. That may be why you're looking at some NoSQL data store in the first place. Just sticking with your SQL database may be valid depending on your needs. It is well known, safe, and has excellent tools support. There are definite drawbacks to splitting your data and bringing in a new and unknown data store. However, you're likely giving up something by sticking to SQL only.

2) Convert everything to NoSQL.
This might be a real stretch, especially for a transactional application. Your data will need to be denormalized as a start. You really need to understand the limits of the NoSQL data store as well as the advantages that you'll gain. Going to NoSQL can be a bit of a culture shock. You're likely going to be better off looking to a column store for your choice as some things will at least look familiar to you. One of the big advantages is that you're not splitting your data between multiple stores.

3) Split your data, use SQL for transactional things and NoSQL for it's specialty.
This is probably the most common use case as you're getting the best of both worlds here. Let the SQL database do what it does best, usually transactions and speed of development with the great tool support. Pick a specific NoSQL data store and let it do what it does best. Perhaps a graph data store to handle relationship data or a document data store to handle BLOB/CLOB storage.
The really big drawback here is that you're splitting your data. Whenever you do that, consistency becomes a real issue though this may be tempered by the character of your data. Companies handle this problem in a variety of ways, but that's a whole discussion all by itself.

4) Use map/reduce on NoSQL to process data, then feed results back to SQL or NoSQL.
This is also a very common use case. This could be either a variation or enhancement to item 3 depending on who it's used. Again, we're using each data store type for it's strengths. If you feed the results back to your SQL database, then you're mostly avoiding the consistency problem of split data. Otherwise, you're going to have to deal with consistency problems as in item 3.
The big advantage here is that you have the most flexibility with this setup. You can process or otherwise transform your input data into a more usable form for your application then import it into the data store of your choice. Map/reduce is your best friend here. You need to remember that you're going to add some latency to your data flow because of this process flow.