Amazon EC2 & S3 When using Python / SQLite?

I'd like to start say 10 different Amazon EC2 instances that will each read a subset of the file and do some processing (every instance will handle a different subset of the 500MB SQLite file) You cannot do this with SQLite; on amazon infrastructure or otherwise. Sqlite performs database level write locking. Unless all ten nodes are performing reads exclusively, you will not attain any kind of concurrency Even the SQLite website says so Situations Where Another RDBMS May Work Better Client/Server Applications gh-volume Websites Very large datasets gh Concurrency Have you considered PostgreSQL?

I'd like to start say 10 different Amazon EC2 instances that will each read a subset of the file and do some processing (every instance will handle a different subset of the 500MB SQLite file) You cannot do this with SQLite; on amazon infrastructure or otherwise. Sqlite performs database level write locking. Unless all ten nodes are performing reads exclusively, you will not attain any kind of concurrency.

Even the SQLite website says so. Situations Where Another RDBMS May Work Better Client/Server Applications gh-volume Websites Very large datasets gh Concurrency Have you considered PostgreSQL?

I suspect that the OP is looking to scale out due to the processing requirements, rather than the DB concurrency. If that is the case then locking the DB will(may? ) not be an issue.

– Michael Anderson Jun 10 at 6:43 Even so, SQLite tends to abort transactions if it feels they may be concurrent. The only way to do this would be to use a mutex lock (something that is outside the scope of SQLite) to obtain the database for a write. Really any RDBMS other than SQLite is likely to be more practical/convenient.

– TokenMacGuy Jun 10 at 6:46 Guys, thank you. The thing is, after my processing is over, I'd like to send over the results (= the SQLite file). While MySQL or so can help to store the results on Amazon's end -- how will I retrieve the results back to my computer?

– user3262424 Jun 10 at 14:32 @user3262424: native database binary formats are undesirable. You should prefer interchange formats that are easy to convert from one application to another. The gold standard in database interchange is CSV.

You can do that easily with mysqldump dev.mysql. Com/doc/mysql-backup-excerpt/5.5/en/… – TokenMacGuy Jun 10 at 0:03 thank you, that's a good point. – user3262424 Jun 12 at 3:24.

Since S3 cannot be directly mounted, your best bet is to create an EBS volume containing the SQLite file and work directly with the EBS volume from another (controller) instance. You can then create snapshots of the volume, and archive it into S3. Using a tool like boto (Python API), you can automate the creation of snapshots and the process of moving the backups into S3.

Beringer: Thank you. So let me make sure I understand: can the EBS Volume be accessed (and its contents modified / updated) from all 10 instances at the same time? – user3262424 Jun 10 at 4:41 If it is attached to an instance, you can use nfs to make it available to the other instances.

– serialworm Jun 10 at 12:57.

Or run mysql (or another DB) on one of your instances.

1 To add to this, If you are concerned about the difficulty of managing a DBMS, Amazon offers a hosted MySQL service, called RDS, which is quite simple to use. – TokenMacGuy Jun 10 at 6:49 @TokenMacGuy: thank you. The thing is, after my processing is over, I'd like to send over the results (= the SQLite file).

While MySQL or so can help to store the results on Amazon's end -- how will I retrieve the results back to my computer? – user3262424 Jun 10 at 14:33.

You can mount S3 bucket on your linux machine. See below: s3fs - code.google.com/p/s3fs/wiki/Installation... - this did work for me. It uses FUSE file-system + rsync to sync the files in S3.It kepes a copy of all filenames in the local system & make it look like a FILE/FOLDER.

This is good if the system is already in place and running with huge collection of data. But, if you are building this from scratch then I would suggest you to have an EBS volume for SQLite and use this script to create a snapshot of your EBS volume: https://github.Com/rakesh-sankar/Tools/blob/master/AmazonAWS/EBS/ebs-snapshot.sh.

Thank you. The only this is, I read somewhere that an EBS Volume can only be mounted to one instance, while I want to have a volume that is shared among all 10 instances. How do I go about it?

– user3262424 Jun 10 at 14:35 I doubt you can do that, Amazon EBS can not be shared among different EC2 instances. But you can use Amazon S3 bucket, use it as a drive and share it with any number of EC2 instances you want. – RakeshS Jun 10 at 16:52 thank you.So, from your comment, it looks like mounting S3 bucket is the only way to share data across different EC2 instances (meaning, an EBS volume will not help in this case).

Is that correct? – user3262424 Jun 10 at 22:19 Yes, you can not share EBS volume on different EC2 machines, but S3 bucket. – RakeshS 24 Jun3 at 4:18 Great, this clarifies things.

– user3262424 Jun 12 at 3:25.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions