



Over time, this aim has come pretty close to complete reality, as virtualized databases now offer administration, one-interface development and, of course, dynamically evolving support for most if not all of today’s new data types.
SQL TABS ALTERNATIVE SOFTWARE
The fundamental idea of the virtualized database as offered by vendors such as Composite Software (now owned by Cisco) and Denodo is to provide a “veneer” that looks like a database and allows common SQL-like access to widely disparate data sources (e.g., text/content, video/graphic, relational, or email/texting). Optimum Hadoop solutions tend to be (a) good at allocating tasks between Hadoop and the user’s relational database, and (b) good at minimizing data transport. Several vendors also attempt to encourage such downloads rather than providing on-cloud processing as part of their solutions. Moreover, the in-house implementations are especially prone to performance problems because they unnecessarily try to move the data to the Hadoop or relational engine rather than using “touch first, move if necessary” technologies. In turn, “enterprise Hadoop” has shown up inside large enterprises as a way of downloading and processing key customer and other social media data in-house.īoth in-house and public-cloud Hadoop are just beginning to emerge from their Wild West stages, and reports still come in of several-hour outages and data inconsistencies that make analytics on this data a real art form. Thus, NoSQL does not mean “no relational allowed” but rather “where relational simply can’t scale adequately.” Over time, this approach has standardized on storing massive amounts of Web/cloud data as files, handled by the Hadoop data-access software. Hadoop grew out of, and specializes in, the need to apply looser standards to data management (e.g., allowing temporary inconsistency between data copies or related data) in order to maximize throughput. An interesting exception is SAP HANA, which is “in-memory-centric,” focusing on in-memory-type transaction schemes and leaving users to link HANA with their relational databases if they wish. The major database vendors have in-memory technology of their own (e.g., Oracle TimesTen) and offer it as either a low-end solution or lashed together with their relational databases so that users can semi-manually assign the right transaction stream to the right database. Thus, an increasing percentage of business analytics and other needs can be handled by in-memory technology. Thus, IBM has just announced plans to deliver mainframes this year with more than a terabyte of “flat” memory per machine. Moreover, the boundary between “flat” and disk storage is moving upwards rapidly. Without the endless software to deal with disks and tape, such a database can perform one or two orders of magnitude faster than a traditional relational database, for problems like analyzing stock-market data on the fly. The core idea of an in-memory database is to assume that most or all storage is “flat” - that is, any piece of data takes about as long to access as any other. Let’s consider briefly the pros and cons of each, and where each might fit in an optimized information infrastructure. The new(er) technologies basically can be categorized as follows: The ongoing popularity of the Hadoop approach to data management should not blind us to the opportunities offered by other database technologies – technologies that, when used in the proper mix, can deliver adequate scaling to meet the demands of the endless 50 percent-plus yearly growth in data to be analyzed, along with higher data quality for better business decisions.
