Free data warehouse - Infobright, Hadoop/Hive or what?

You could also consider GridSQL. Even for a single server, you can create multiple logical "nodes" to utilize multiple cores when processing queries.

You could also consider GridSQL. Even for a single server, you can create multiple logical "nodes" to utilize multiple cores when processing queries. GridSQL uses PostgreSQL, so you can also take advantage of partitioning tables into subtables to evaluate queries faster.

You mentioned the data is time-oriented, so that would be a good candidate for creating subtables.

Just adding, yes, I work for EnterpriseDB, who sponsors GridSQL. – Mason Mar 17 '10 at 22:53 It looks like GridSQL died recently and developers moved to Stado. – Peter Gwiazda Oct 25 at 6:09.

Am having the same problem here and made researches; two types of storages for BI : column oriented. Free and known : monetDB, LucidDb, Infobright. InfiniDB Distributed : hTable, Cassandra (also column oriented theoretically) Document oriented / MongoDb, CouchDB The answer depends on what you really need : If your millions of row are loaded at once (nighly batch or so), InfiniDB or other column oriented DB are the best; They have great performance and are "BI oriented".

http://www.d1solutions.ch/papers/d1_2010_hauenstein_real_life_performance_database.pdf And they won't require a setup of "nodes", "sharding" and other stuff that comes with distributed/"NoSQL" DBs. http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/ If the rows are added in real time.. then column oriented DB are bad. You can either choose two have two separate DB (that's my choice : one noSQL for real feeding of the stats by the front, and real time stats.

The other DB column-oriented for BI). Or turn towards something that mixes column oriented (for out requests) and distribution (for writes) / like Cassandra. Document oriented DBs are not suited for BI, they are more useful for CRM/CMS issues where you need frequent access to a particular row As for the exact choice inside a category, I'm still undecided.

Cassandra in distributed, and Monet or InfiniDB for CODB, are leaders. Monet is reported to have problem loading very big tables because it runs indexes in memory.

If you're looking for compatibility with reporting tools, something based on MySQL may be your best choice. As for what will work for you, Infobright may work. There are several other solutions as well, however you may want also to look at plain-old MySQL and the Archive table.

Each record is compressed and stored and, IIRC, it's designed for your type of workload, however I think Infobright is supposed to get better compression. I haven't really used either, so I'm not sure which will work best for you. As for the key-value stores (E.g.

NoSQL), yes, they can work as well and there are plenty of alternatives out there. I know CouchDB has "views", but I haven't had the opportunity to use any, so I don't know how well any of them work. My only concern with your data set is that since you mentioned time, you may want to ensure that whatever solution you use will allow you to archive data past a certain time.It's a common data warehouse practice to only keep N months of data online and archive the rest.

This is where partitioning, as implemented in an RDBMS, comes in very useful.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions