Drive space hungry NoSQL's databases?

MongoDB has a "database repair" function that also performs a compaction. However, such a compaction is not going to happen while the DB is running.

MongoDB has a "database repair" function that also performs a compaction. However, such a compaction is not going to happen while the DB is running. But if DB space is a serious issue, then try setting up a MongoDB master/slave pair.As the data needs compaction, run the repair on the slave, allow it to "catch up" and then switch them over.

You can now safely compact the master instead. But I have to echo jbellis's comment: you will probably need more space and most of these products are making the assumption that disk space is (relatively) cheap. If disk space is really tight, then you'll find that MongoDB is reasonably sized, but it's going to have a difficult time competing with tabular CSV data.

Think of it this way, what's more space efficient? A CSV file with a million lines that same data formatted in JSON Obviously the JSON is going to be longer because you're repeating the field names every time. The only exception here is a CSV file with like 100 columns of which only a few are filled for each row.

(but that's probably not your data).

That's true, if you use long field names you need more disk space when using Mongodb. And Mongodb preallocates files of 2 gigabyte. – TTT Jun 10 '10 at 7:27 Yes, CouchDB has "compact" option too witch after test reduce db size several times (Cassandra do it like "in background" because of better organized bulk writes).

– inquisitor Jun 10 '10 at 9:25 Piggy backing on this, if it's a problem with 1 node having enough disk space, try something like HBase or Cassandra it's very easy to add more data storage (and processing power! ) simply by adding more nodes. I don't know how MongoDB/CouchDB are structured, so I don't know if you can easily and simply do the same thing with them.

– Drizzt321 Aug 18 at 23:19 MongoDB also allows for "horizontal" scaling via sharding. – Gates VP Aug 19 at 6:48.

Disk space is about the cheapest resource today, so if you can trade it for less seeks or less CPU used it is a good trade to make. That is what Cassandra does.

Magnetic disk space yes, but not SSD space, which is what you'll want a high performance DB stored on anyway. Seeks on the other hand are nearly free on SSDs. Furthermore, efficiently packing data into pages on disk means potentially much more effective caching at the page buffer layer within the DB, another win.

– TheManWithNoName Dec 22 '10 at 7:16 magnetic vs ssd is not one-size-fits all; if your hot data set fits into ram (very common! ) then ssd is just flushing money down the drain. For less predictable workloads you do see Cassandra deployed on SSDs, where its avoidance of seeks on writes is a big win for (non) write amplification.

– jbellis Dec 23 '10 at 5:48.

Many databases sparsely allocate file structures and their "length" is much larger than their on-disk size.

I check that too, that file buffer isn't so big so I not even consider that in db like 15 mln documents (even if it will be few GB). I think this "space hungry" is weekness of shemaless db's but I'm not sure. – inquisitor Jun 9 '10 at 18:01.

I think the problem is the key. CouchDB stores its data in a b-tree. UUID - keys are the cause the that you need a large amount of disk space.

B-tree stores data compact by nature exept UUID's. Try to find a key which is more confortable for a b-tree. Best regards.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions