How do you store uploaded files in a filesystem?

One technique is to store the data in files named after the hash (SHA1) of their contents. This is not easily guessable, any backup program should be able to handle it, and it easily sharded (by storing hashes starting with 0 on one machine, hashes starting with 1 on the next, etc).

One technique is to store the data in files named after the hash (SHA1) of their contents. This is not easily guessable, any backup program should be able to handle it, and it easily sharded (by storing hashes starting with 0 on one machine, hashes starting with 1 on the next, etc). The database would contain a mapping between the user's assigned name and the SHA1 hash of the contents.

It's going to be a lot faster than the rate at which files are uploaded in any case. I have used this technique successfully in a high volume application in the past. – Greg Hewgill Oct 22 '08 at 17:59.

Guids for filenames, automatically expanding folder hierarchy with no more than a couple of thousand files/folders in each folder. Backing up new files is done by backing up new folders. You haven't indicated what environment and/or programming language you are using, but here's a C# / .net / Windows example: using System; using System.IO; using System.Xml.

Serialization; /// /// Class for generating storage structure and file names for document storage. /// Copyright (c) 2008, Huagati Systems Co. ,Ltd.

/// public class DocumentStorage { private static StorageDirectory _StorageDirectory = null; public static string GetNewUNCPath() { string storageDirectory = GetStorageDirectory(); if (!storageDirectory. EndsWith("\\")) { storageDirectory += "\\"; } return storageDirectory + GuidEx.NewSeqGuid().ToString() + ". Data"; } public static void SaveDocumentInfo(string documentPath, Document documentInfo) { //the filestream object don't like NTFS streams so this is disabled for now... return; //stores a document object in a separate "docinfo" stream attached to the file it belongs to //XmlSerializer ser = new XmlSerializer(typeof(Document)); //string infoStream = documentPath + ":docinfo"; //FileStream fs = new FileStream(infoStream, FileMode.

Create); //ser. Serialize(fs, documentInfo); //fs.Flush(); //fs.Close(); } private static string GetStorageDirectory() { string storageRoot = ConfigSettings. DocumentStorageRoot; if (!storageRoot.

EndsWith("\\")) { storageRoot += "\\"; } //get storage directory if not set if (_StorageDirectory == null) { _StorageDirectory = new StorageDirectory(); lock (_StorageDirectory) { string path = ConfigSettings. ReadSettingString("CurrentDocumentStoragePath"); if (path == null) { //no storage tree created yet, create first set of subfolders path = CreateStorageDirectory(storageRoot, 1); _StorageDirectory. FullPath = path.

Substring(storageRoot. Length); ConfigSettings. WriteSettingString("CurrentDocumentStoragePath", _StorageDirectory.

FullPath); } else { _StorageDirectory. FullPath = path; } } } int fileCount = (new DirectoryInfo(storageRoot + _StorageDirectory. FullPath)).GetFiles().

Length; if (fileCount > ConfigSettings. FolderContentLimitFiles) { //if the directory has exceeded number of files per directory, create a new one... lock (_StorageDirectory) { string path = GetNewStorageFolder(storageRoot + _StorageDirectory. FullPath, ConfigSettings.

DocumentStorageDepth); _StorageDirectory. FullPath = path. Substring(storageRoot.

Length); ConfigSettings. WriteSettingString("CurrentDocumentStoragePath", _StorageDirectory. FullPath); } } return storageRoot + _StorageDirectory.

FullPath; } private static string GetNewStorageFolder(string currentPath, int currentDepth) { string parentFolder = currentPath. Substring(0, currentPath. LastIndexOf("\\")); int parentFolderFolderCount = (new DirectoryInfo(parentFolder)).GetDirectories().

Length; if (parentFolderFolderCount EndsWith("\\")) { currentDir += "\\"; } Directory. CreateDirectory(currentDir + directoryName); if (currentDepth = null) { return ParentDirectory. FullPath + "\\" + DirectoryName; } else { return DirectoryName; } } set { if (value.

Contains("\\")) { DirectoryName = value. Substring(value. LastIndexOf("\\") + 1); ParentDirectory = new StorageDirectory { FullPath = value.

Substring(0, value. LastIndexOf("\\")) }; } else { DirectoryName = value; } } } } }.

SHA1 hash of the filename + a salt (or, if you want, of the file contents. That makes detecting duplicate files easier, but also puts a LOT more stress on the server). This may need some tweaking to be unique (i.e.

Add Uploaded UserID or a Timestamp), and the salt is to make it not guessable. Folder structure is then by parts of the hash. For example, if the hash is "2fd4e1c67a2d28fced849ee1bb76e7391b93eb12" then the folders could be: /2 /2/2f/ /2/2f/2fd/ /2/2f/2fd/2fd4e1c67a2d28fced849ee1bb76e7391b93eb12 This is to prevent large folders (some Operating Systems have trouble enumarating folders with a million of files, hence making a few subfolders for parts of the hash.

How many levels? That depends on how many files you expect, but 2 or 3 is usually reasonable.

Just in terms of one aspect of your question (security): the best way to safely store uploaded files in a filesystem is to ensure the uploaded files are out of the webroot (i.e. , you can't access them directly via a URL - you have to go through a script). This gives you complete control over what people can download (security) and allows for things such as logging.

Of course, you have to ensure the script itself is secure, but it means only the people you allow will be able to download certain files.

Expanding on Phill Sacre's answer, another aspect of security is to use a separate domain name for uploaded files (for instante, Wikipedia uses upload.wikimedia. Org), and make sure that domain cannot read any of your site's cookies. This prevents people from uploading a HTML file with a script to steal your users' session cookies (simply setting the Content-Type header isn't enough, because some browsers are known to ignore it and guess based on the file's contents; it can also be embedded in other kinds of files, so it's not trivial to check for HTML and disallow it).

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions