Powered By Blogger

Monday, September 5, 2011

Content Storage and Lifecycle

http://wiki.alfresco.com/wiki/Content_Store_Configuration

In Alfresco, the content binaries are stored separately from the metadata, which is always found in the database. The primary metadata that acts as a reference to the binaries takes the form contentUrl=store://.........|mimetype=...(etc). The abstraction that takes care of mapping the store://... part of the reference to a physical location is the ContentStore interface.


Content Binaries and Transactions

Because binaries are not modified, it means that writes to the filesystem do not become visible until the metadata has been committed to the database. In the event of transaction failure or rollback, the metadata will be left in the pre-transaction state i.e. referencing the older binary; the newer content binary will be left in an orphaned state for later cleanup.

Deleting Files

When a file node (or anything containing a reference to raw content) is permanently deleted, there is just one less reference to the raw content. When there are no more references to some raw content, it is called orphaned. Were nothing further done, the content stores would just irreversibly fill up with content.

Cleaning up Orphaned Content (Purge)

Once all references to a content binary have been removed from the metadata, the content is said to be orphaned. Orphaned content can be deleted or purged from the content store while the system is running. Identifying and either sequestering or deleting the orphaned content is the job of the contentStoreCleaner.

In the default configuration, the contentStoreCleanerTrigger fires the contentStoreCleaner bean.

        ...               14                                                                                              
  • protectDays

Use this property to dictate the minimum time that content binaries should be kept in the contentStore. In the above example, if a file is created and immediately deleted, it will not be cleaned from the contentStore for at least 14 days. The value should be adjusted to account for backup strategies, average content size and available disk space. Setting this value to zero will result in a system warning as it breaks the transaction model and it is possible to lose content if the orphaned content cleaner runs whilst content is being loaded into the system. If the system backup strategy is just to make regular copies, then this value should also be greater than the number of days between successive backup runs.

  • store

This is a list of ContentStore beans to scour for orphaned content.

  • listeners

When orphaned content is located, these listeners are notified. In this example, the deletedContentBackupListener copies the orphaned content to a separate deletedContentStore.

Note that this configuration will not actually remove the files from the file system but rather moves them to the designated deletedContentStore, usually contentstore.deleted. The files can be removed from the deletedContentStore via script or cron job once an appropriate backup has been performed.



No comments:

Post a Comment