[Duraspace] Fwd: [Dspace-tech] Filterable document types in DSpace

helix84 helix84 at centrum.sk
Thu May 16 11:07:33 SAST 2013


Hi Hilton, it seems you're replying to my post on dspace-tech, but you
didn't CC your message to dspace tech, so I'm attaching your full
message below.

On Thu, May 16, 2013 at 10:54 AM, Hilton Gibson <hilton.gibson at gmail.com> wrote:
> It seems the open formats work but not the closed ones like .xlsx and .docx
> etc..

It's exactly the opposite. The binary formats work, but the Office
Open XML ones do not yet have support in Apache POI in the version
DSpace uses. If I'm reading the Apache POI docs correctly, a newer
version does support them, so you may want to try bumping the
apache-poi dependency version number and using the new filters. Let us
know if you succeed.


Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette



On Thu, May 16, 2013 at 10:54 AM, Hilton Gibson <hilton.gibson at gmail.com> wrote:
> Hi All
>
> Important information about digital document formats can have their content
> extracted/filtered by the DSpace software.
> It seems the open formats work but not the closed ones like .xlsx and .docx
> etc..
> See:
> http://wiki.lib.sun.ac.za/index.php/SUNScholar/Digitisation/Digital_Formats
> for more info.
>
> Cheers
>
> hg
>
> ---------- Forwarded message ----------
> From: helix84 <helix84 at centrum.sk>
> Date: 16 May 2013 10:45
> Subject: Re: [Dspace-tech] Filterable document types in DSpace
> To: "Thornton, Susan M. (LARC-B702)[LITES]" <susan.m.thornton at nasa.gov>
> Cc: "dspace-tech (dspace-tech at lists.sourceforge.net)"
> <dspace-tech at lists.sourceforge.net>
>
>
> Hi Sue,
>
> the supported formats are documented here (this is for 1.8 because 1.7
> didn't have this in docs, but I don't believe it changed between 1.7 and
> 1.8):
>
> https://wiki.duraspace.org/display/DSDOC18/Transforming%20DSpace%20Content%20(MediaFilters)
>
> To sum it up, binary MS Office files are filterable (incl. Excel), but not
> the Office Open XML (MS Office 2007+) formats (docx, xlsx, pptx). I'm not
> really sure about PDF/A since you're explicitly asking about it, but I don't
> think it would be any worse than the generic PDF support, i.e. IMHO it's
> supported. rtf doesn't seem to be supported.
>
>
> Regards,
> ~~helix84
>
> Compulsory reading: DSpace Mailing List Etiquette
> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>
> ------------------------------------------------------------------------------
> AlienVault Unified Security Management (USM) platform delivers complete
> security visibility with the essential security capabilities. Easily and
> efficiently configure, manage, and operate all of your security controls
> from a single console and one unified framework. Download a free trial.
> http://p.sf.net/sfu/alienvault_d2d
> _______________________________________________
> DSpace-tech mailing list
> DSpace-tech at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
> List Etiquette:
> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>
>
>
> --
> Hilton Gibson
> Systems Administrator
> JS Gericke Library
> Room 1025C
> Stellenbosch University
> Private Bag X5036
> Stellenbosch
> 7599
> South Africa
>
> Tel: +27 21 808 4100 | Cell: +27 84 646 4758
> http://library.sun.ac.za
> http://scholar.sun.ac.za
> http://ar1.sun.ac.za
> http://aj1.sun.ac.za
>
> _______________________________________________
> Duraspace mailing list
> Duraspace at lists.lib.sun.ac.za
> http://lists.lib.sun.ac.za/mailman/listinfo/duraspace
>

On Thu, May 16, 2013 at 10:54 AM, Hilton Gibson <hilton.gibson at gmail.com> wrote:
> It seems the open formats work but not the closed ones like .xlsx and .docx
> etc..

It's exactly the opposite. The binary formats work, but the Office
Open XML ones do not yet have support in Apache POI in the version
DSpace uses. If I'm reading the Apache POI docs correctly, a newer
version does support them, so you may want to try bumping the
apache-poi dependency version number and using the new filters. Let us
know if you succeed.


More information about the Duraspace mailing list