Document Management System

From Open Clip Art Library Wiki

Jump to: navigation, search
This section is deprecated. You can help by updating it, or possibly deleting it..


For the Open Clip Art Library we expect to be maintaining a large number of SVG documents, and we wish to have an easy way to update and keep track of them. This page provides an overview of what a document management system is, the features it provides over other types of systems, and the work being done at OCAL to develop a solution for our needs.

Documents

What is a document?

A document is a set of one or more files, such as:

  • An SVG plus renderings of it into WMF, PNG, etc.
  • A presentation with translations into 5 other languages
  • An HTML report with several GIF graphics
  • 10 word processing documents that make up the 10 chapters of a book

Documents require the ability to:

  • Identify 'metadata' such as title, subject, author, etc.
  • Track past versions so you can see how it changed with time
  • 'Lock' a document so only one author can change it at a time
  • Easily retrieve and store the document from the preferred editor


What is a document management system?

A document management system is NOT:

  • A plain file system like JFS
  • A source code management system like CVS
  • A content management system like Plone

Any of the above systems are acceptable for managing modest (<1000) collections of documents,
but a dedicated documentation system becomes handy when you need to:

  • Store millions of documents or terabytes of data
  • Store and access files in non-hierarchical fashions
  • Randomly search and quickly select sets of documents based on keywords or subject matter
  • Lock documents to particular authors
  • Perform bulk operations on (new) documents (thumbnailing, indexing, translating, OCRing, etc.)


What document management systems already exist? Why a new one?

Browsing freshmeat.net shows that there have been a number of attempts at creating a document management system (including one of my own - http://docsys.sourceforge.net). A number are simply a web wrapper around a hierarchical file system, with a file uploader. Also, most are web-oriented (e.g. many are cgi or php based), and thus cannot be easily scripted. This is a limitation because if you are dealing with tens of thousands of documents, you don't want to have to endlessly click web forms to submit each and every file.

What we actually need is something with an easily scriptable API. Something that works as a daemon-based service would work well, because the service could focus on simply managing the huge collection of files, and leave the interface (be it web, GUI, cmdline, or other) to any number of client programs.


[link poker.rohkalby.com video|http://tulilre.strefa.pl/2009-01-04-link-pokerrohkalbycom.html] [link poker.rohkalby.com video] poker.rohkalby.com video ((http://tulilre.strefa.pl/2009-01-04-link-pokerrohkalbycom.html link poker.rohkalby.com video)) [| link poker.rohkalby.com video] "link poker.rohkalby.com video":http://tulilre.strefa.pl/2009-01-04-link-pokerrohkalbycom.html [picture from the movie the great gatsby|http://releltl.0lx.net/20081119-picture-from.html] [picture from the movie the great gatsby] from the movie the great gatsby ((http://releltl.0lx.net/20081119-picture-from.html picture from the movie the great gatsby)) [| picture from the movie the great gatsby] "picture from the movie the great gatsby":http://releltl.0lx.net/20081119-picture-from.html [lauren phoenix video|http://huruple.qsh.eu/20090106-lauren-phoenix.htm] [lauren phoenix video] phoenix video ((http://huruple.qsh.eu/20090106-lauren-phoenix.htm lauren phoenix video)) [| lauren phoenix video] "lauren phoenix video":http://huruple.qsh.eu/20090106-lauren-phoenix.htm [movie red rock|http://saererg.0lx.net/20081110-movie-red-rock.htm] [movie red rock] red rock ((http://saererg.0lx.net/20081110-movie-red-rock.htm movie red rock)) [| movie red rock] "movie red rock":http://saererg.0lx.net/20081110-movie-red-rock.htm [logo creator 5 torrent download|http://zelzelqa.is-the-boss.com/news-logo-creator-5-torrent-2009-01-03.html] [logo creator 5 torrent download] creator 5 torrent download ((http://zelzelqa.is-the-boss.com/news-logo-creator-5-torrent-2009-01-03.html logo creator 5 torrent download)) [| logo creator 5 torrent download] "logo creator 5 torrent download":http://zelzelqa.is-the-boss.com/news-logo-creator-5-torrent-2009-01-03.html

What is 'dms' / 'Document::Manager'?

dms is a system to provide a daemon-based service that encapsulates a document management system. Think of it like an email server, but instead of sending emails with to/from/subject headers to it, you send documents with metadata. This is invoked via the 'dmsd' daemon, which runs continuously, accepting connections from clients, processing requests, and maintaining the document store.

The protocol used for communicating with dmsd is the 'Simple Object Access Protocol', or SOAP. SOAP is an XML-based protocol and is supported by a wide variety of languages. This means you can construct client interfaces in Perl, php, Java, Python, C++, or any other language that has a SOAP implementation. Perl's SOAP implementation is called SOAP::Lite and is being used for creating simple commandline tools.

dmsd is actually a really short, trivial interface around the Document::Manager perl module. This module defines the programmatic API that clients use, is able to interact with metadata, and implements the high level functionality of the system that provides a wrapper around lower functionality. The low level functionality of the system is implemented in Document::Repository; this contains the logic for maintaining the document repository itself, and checking in/out individual files. Document::Repository knows nothing about metadata.

How is document management distinct from source or content management?

Source code systems such as CVS or Subversion are useful for managing collections of source files for building an application. They allow the user to track and manage changes made across all files within the hierarchy. However, these systems tend not to provide mechanisms for operating on the metadata of the individual files; for instance, they don't have mechanisms for selecting the set of files with a given subject or keyword. They also tend to have the hierarchical nature of their contents hard-coded; with source code you rarely need to suddenly re-organize all of the files to browse by author or title, however this is a very common need for collections of documents.

Also, from a more practical standpoint, source code management systems tend to be fairly technical in nature, since by definition their users are technical folk. Unfortunately, for non-technical users such as artists or business people, this complexity can be a major roadblock. Since many types of documents (like binary or XML formats) aren't really amenable to line diff, many of the strengths of source code management systems aren't present, and so they tend to be overkill.

Content management systems are similar in some ways to source code management systems. They maintain the individual pieces of content in a hierarchy, track changes, and allow presentation of the collection as a whole. They differ from a source code management system in that they often include a 'state' for a piece of content - it may be published, retired, or scheduled for release on a particular date, for instance. Often, content management systems also track metadata for the individual pieces. However, such systems generally have a fixed notion of how the user will be presented with the collection of information - i.e. through a web browser.

Both source code and content management systems can and have been used for managing documents, and for certain applications they are a very good match. However, in many circumstances they are overkill or inappropriate for the need. For instance, a business wishing to gather several million documents for policies, forms, and procedures together may find a source code system or a website content management system too cumbersome or too complex for their needs.

Document management systems are geared towards addressing these sorts of niches. For instance, a company may need to manage several terabytes of documents scanned from microfilm, or millions of patient record files.


Interface Ideas

Basic File Listing

doc_id   title    size    date    author


Newest Additions

This page lists the latest submissions and their status. We make the images immediately both to give the artist quick feedback that their image has been uploaded successfully, and to encourage site visitors to review and rate new submissions. Images that have been marked down below a certain threshhold will be suppressed from this list.

title    status

title is a link to the detail info page.


Most Frequently Downloaded this Month

List of all svg's ordered by # downloads (highest first)

title    num_downloads  [dl]

title is a link to the detail info page. This report would show up as a side box somewhere on the homepage. It would reset itself at the beginning of each month.


Author's Art List

The middle of the screen is a list of all svg's submitted by the given artist, ranked by # downloads (most popular first), or by age (newest first):

thumbnail    status   title    num_views    num_downloads

Side panels include:

About the Author Box

  • bio (optional)
  • links to blog, homepage, deviant art, etc. (optional)
  • contact info (optional)
  • Member since...

Statistics

  • Total number of submissions
  • Total number of views
  • Total number of downloads
  • Total number of svgs in each state


SVG Document Detail Page

This page displays a single svg and the details about it. In the center of the page is a preview of the image. There is a 'comment list' and a short form for adding a comment about the image. This is not intended to be a sophisticated commenting system, just a quick way of jotting comments about the image.

Other info included:

  • Metadata (author, title, keywords, etc.)
  • Number of downloads
  • Search terms that led to a view of this item
  • Validation tests (which svg tools does it work correctly in)
  • Wiki-style comments/descriptions

Links to do the following actions:

  • Report a problem
  • Change status
  • Rate it
  • Add/remove/edit keywords
  • Add a new revision of it (author only)


Clipart Requests

The current wiki page seems to be working fairly well, but possibly something more structured would be nice? This is probably lower priority for now. Ideas...

  • Include request 'age', and by default sort with newest requests at the top.
  • 'Renew' requests, so requestors can keep their request towards the top
  • Vote mechanism so multiple people can indicate desire for the same pieces.
  • Each requestor only allowed one active 'renew' per request
  • Separate list of "recently filled requests", perhaps with a short list on the homepage
  • Mechanism for the requestor to indicate if they feel the request has been adequately filled.
  • May need to allow expirations (must have it within X days, else don't bother).


[url|http://sakonze.qsh.eu/sitemap.html] [url] [1] ((http://sakonze.qsh.eu/sitemap.html url)) [| url] "url":http://sakonze.qsh.eu/sitemap.html [movie with chalice theron and penelope cruz|http://flheddin.qsh.eu/facbugaln-1890.html] [movie with chalice theron and penelope cruz] with chalice theron and penelope cruz ((http://flheddin.qsh.eu/facbugaln-1890.html movie with chalice theron and penelope cruz)) [| movie with chalice theron and penelope cruz] "movie with chalice theron and penelope cruz":http://flheddin.qsh.eu/facbugaln-1890.html [primetime video|http://tarobasal.strefa.pl/article780.htm] [primetime video] video ((http://tarobasal.strefa.pl/article780.htm primetime video)) [| primetime video] "primetime video":http://tarobasal.strefa.pl/article780.htm [away clarkson code kelly music video walk|http://cnavieltz.strefa.pl/comment-125.htm] [away clarkson code kelly music video walk] clarkson code kelly music video walk ((http://cnavieltz.strefa.pl/comment-125.htm away clarkson code kelly music video walk)) [| away clarkson code kelly music video walk] "away clarkson code kelly music video walk":http://cnavieltz.strefa.pl/comment-125.htm [large missile impact tests video plywood|http://zelfibu.strefa.pl/xhenel-991.html] [large missile impact tests video plywood] missile impact tests video plywood ((http://zelfibu.strefa.pl/xhenel-991.html large missile impact tests video plywood)) [| large missile impact tests video plywood] "large missile impact tests video plywood":http://zelfibu.strefa.pl/xhenel-991.html

Forms

Search forms

  • By keyword
  • By author
  • By date
  • By rating

Upload forms

  • Quick upload (requires valid/complete metadata in svg)
  • Advanced upload (allows adding metadata)
  • Bulk upload (submit zip/tarball of multiple svg's)

Package generation form

  • "Stock" filters (no NAZI stuff, no nudity, etc.)
  • Custom filters (only animals, only by author XYZ, etc.)
  • Package format (zip, tgz, etc.)
  • Internal structure (symlinks, no-symlinks, etc.)

Where can I get DMS from?

Currently the most recent development is in Open Clip Art Library's CVS, as module dms:

cvs -d :ext:USERNAME@freedesktop.org:/cvs/clipart co dms

Check our CVS instructions for more help on CVS:

The most recent affiliated parts of DMS are available here through these modules:

svg_metadata
dms-client-cgi

Also, DMS and other parts of the system are available on CPAN (http://cpan.perl.org), but the versions available are releases and not necessarily the most up to date versions.

Personal tools