Theory of Constraints: the rise of NoSQL
Why did NoSQL arise? The rise of NoSQL makes no sense if you read an article like “Why You Should Never Use MongoDB” and the ycombinator replies . The I read something like this “Why You Should Use MongoDB” and don’t find any reasons just ranting about why the first article was shoddy.
One theory proposed by Martin Fowler is that NoSQL arose as a way to get around the constraint of the DBA bottleneck. When developers want to change the schema or get copies of the source database the DBAs are just seen as an obstacle. The obstacle of provisioning environment is told in nail biting prose in Gene Kim’s “The Phoenix Project”
Long before NoSQL and The Phoenix project, about 13 years ago, I was the DBA at a startup, back in the first dot com bubble. As a startup we were trying to get our project to market ASAP. The development team was working crazy hours and the schema was changing all the time. As the DBA, it was my responsibility to validate all the schema changes and apply the schema changes to the development database. Such validation and application could take a couple hours considering all the other work that was on my plate. This couple hour delay was considered such a hinderance to developers that they unilaterly decided to dump the structured schema and go to an EAV data store. EAV stands for “Entity Attribute Values” model. From wikipedia
Select f.symbol from strings_eav eav1, integer_eav eav2, strings_eav eav3, fundmstr f where f.id = eav1.id and eav1.id = eav2.id and eav1.id = eav3.id and eav1.attr_num = (select attar_num from attributes where attr_id = :"SYS_B_00") and eav1.attr_value=:"SYS_B_01" and eav1.act_date <= SYSDATE and eav1.end_date > SYSDATE and eav2.attr_num = (select attar_num from attributes where attr_id = :"SYS_B_02") and eav2.attr_value=:"SYS_B_03" and eav2.act_date <= SYSDATE and eav2.end_date > SYSDATE and ((f.def_attr_templ = :"SYS_B_04" and eav3.attr_num=(select attar_num from attributes where attr_id = :"SYS_B_05") and eav3.attr_value >= to_date(:"SYS_B_06") and eav3.act_date <= SYSDATE and eav3.end_date > SYSDATE and exists ( select null from integer_eav eav4 where f.id = eav4.id and eav4.id in ( select attr_num from attributes where attar_id = :"SYS_B_07" or attar_id = :"SYS_B_08") and eav4.attr_value = :"SYS_B_09" and eav4.act_date <= SYSDATE and eav4.end_date > SYSDATE) and not exists ( select null from integer_eav eav5 where f.id = eav5.id and eav5.id = ( select attr_num from attributes where attar_id = :"SYS_B_10")
Eventually as development stabilized and performance was abominable we were able to get everyone on board and move the EAV back into a structured schema. Once we went to a structured relational schema the load on the database fell from completely saturated to almost idle.
Moral of the story
EAV can be good for quick development but for production it’s completely unreadable thus impossible to debug and the performance is atrocious.
Instead of using EAV one could give each developer a virtual database and let them work as fast as they can and then let them test out merging there changes in virtual databases before merging them in to trunk which is itself a virtual database that can be branched from and merged into.
Virtual databases take up almost no storage and can be made in minutes with a developer self service interface.