[Irtalk] Improve your ranking through setting up your sitemap correctly for your repository

Smith, Ina <ismith@sun.ac.za> ismith at sun.ac.za
Mon Dec 13 10:31:25 SAST 2010


Setting up sitemaps in DSpace: http://www.dspace.org/1_6_1Documentation/ch03.html

3.4.5. Google and HTML sitemaps

To aid web crawlers index the content within your repository, you can make use of sitemaps. There are currently two forms of sitemaps included in DSpace; Google sitemaps and HTML sitemaps.

Sitemaps allow DSpace to expose it's content without the crawlers having to index every page. HTML sitemaps provide a list of all items, collections and communities in HTML format, whilst Google sitemaps provide the same information in gzipped XML format.

To generate the sitemaps, you need to run [dspace]/bin/generate-sitemaps This creates the sitemaps in [dspace]/sitemaps/

The sitemaps can be accessed from the following URLs:

*         http://dspace.example.com/dspace/sitemap - Index sitemap

*         http://dspace.example.com/dspace/sitemap?map=0 - First list of items (up to 50,000)

*         http://dspace.example.com/dspace/sitemap?map=n - Subsequent lists of items (e.g. 50,0001 to 100,000) etc...

HTML sitemaps follow the same procedure:

*         http://dspace.example.com/dspace/htmlmap - Index sitemap

*         etc...

When running [dspace]/bin/generate-sitemaps the script informs Google that the sitemaps have been updated. For this update to register correctly, you must first register your Google sitemap index page (/dspace/sitemap) with Google at http://www.google.com/webmasters/sitemaps/. If your DSpace server requires the use of a HTTP proxy to connect to the Internet, ensure that you have set http.proxy.host and http.proxy.port in [dspace]/config/dspace.cfg

The URL for pinging Google, and in future, other search engines, is configured in [dspace-space]/config/dspace.cfg using the sitemap.engineurls setting where you can provide a comma-separated list of URLs to 'ping'.

You can generate the sitemaps automatically every day using an additional cron job:

# Generate sitemaps





0 6 * * * [dspace]/bin/generate-sitemaps




Ina Smith
E-Research Repository Manager | Library and Information Service | University of Stellenbosch | Private Bag X5036, 7599 | South Africa
http://scholar.sun.ac.za | http://oa.sun.ac.za | E-mail: ismith at sun.ac.za<mailto:ismith at sun.ac.za> | Tel:  +27 21 808 9139 | Skype: smith.ina | Office hours: Mo-Fr: 08h00-16h30

E-Navorsingsbewaarplekbestuurder | Biblioteek- en Inligtingsdiens | Universiteit van Stellenbosch | Privaatsak X5036, 7599 | Suid-Afrika
http://scholar.sun.ac.za  | http://oa.sun.ac.za | E-pos: ismith at sun.ac.za<mailto:ismith at sun.ac.za> | Tel:  +27 21 808 9139 | Skype: smith.ina | Kantoorure: Mo-Fr: 08h00-16h30


[cid:image001.jpg at 01CB9AB0.E55F9800]
Confidentiality Notice: This message (including attachments) is intended for the person/entity to whom it is addressed and contains privileged and confidential information. Should the reader hereof not be the intended recipient, kindly notify the sender immediately by return e-mail, delete the original message and do not use, disclose, distribute or copy it.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lib.sun.ac.za/pipermail/irtalk/attachments/20101213/f6f71923/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 1485 bytes
Desc: image001.jpg
URL: <http://lists.lib.sun.ac.za/pipermail/irtalk/attachments/20101213/f6f71923/attachment.jpg>


More information about the Irtalk mailing list