Igeneric Thoughts Archives: Search

« Main |

Amazon's Jeff Bezos announces OpenSearch; a next step in search syndication?


March 16, 2018

JupiterResearch's Gary Stein draws my attention to an interesting addition to Amazon's A9 search engine, pointing to an article in eWeek yesterday.

The addition is called OpenSearch, and is described as

“a collection of technologies, all built on top of popular open standards, to allow content providers to publish their search results in a format suitable for syndication. You can see how this works on A9.com.

Many sites today return search results as an tightly integrated part of the website itself. Unfortunately, those search results can't be easily reused or made available elsewhere, as they are usually wrapped in HTML and don't follow any one convention. OpenSearch offers an alternative: an open format that will enable those search results to be displayed anywhere, anytime. Rather than introduce yet another proprietary or closed protocol, OpenSearch is a straightforward and backward-compatible extension of RSS 2.0, the widely adopted XML-based format for content syndication.

Any site that has content—and a search box—can choose to return results in OpenSearch RSS. This includes travel sites, classifieds, encyclopedias…. If you can provide search results for something, it probably can fit into the OpenSearch model. Returning OpenSearch results is easy—the format is the standard set of XML elements, plus three additional elements designed to support navigation between pages.”

Getting your search functions off your site and visible where your users actually are is an area of ongoing interest to the Igeneric. We already have library system vendors offering assorted cross-search solutions within their products (and bodies like NISO trying to harmonise this). We also have more lightweight solutions, such as the excellent Remote Search Interface from the ADS that I've written about before.

Looking at the list of 'columns' already available to A9, organisations like the British Library are already listed. How might this move forward, and is there value in aggressively pushing organisations to register here...?

Ever-reliable, Talis have taken a look, liked it, and done something with OpenSearch...

Posted by Oliver Smith-Toynes at 14:14 | Make or Read Comments(0) | TrackBack (0)

Access RedLightGreen from within Firefox


February 24, 2018


A posting to ResourceShelf draws my attention to the availability of a Firefox search plug-in for RedLightGreen.

As mentioned before, Firefox includes the capability to submit searches to a variety of search engines without having to start from their web sites.

RedLightGreen is an interesting service from the Research Libraries Group (RLG) which allows you to search over 120,000,000 books and discover whether or not they are available in a library close to you.

Posted by Oliver Smith-Toynes at 14:35 | Make or Read Comments(0) | TrackBack (0)

Semantic Web searching from the University of Southampton


February 17, 2018

A piece in The Register today covers a Semantic Web project from Southampton called mSpace.

The application contextualises a search topic - such as Classical music - by gathering information together from a range of sources and allowing the user to group and sort it in different ways. Someone searching for a piece of music by Mozart, for example, might also find other pieces by the same composer and his contemporaries, as well as contextual information.

Quoting from a University of Southampton Press Release:

“An example of the framework in action is the mSpace Classical Music Browser. The browser brings together audio, text, links, and images about the domain in a way that people can explore and reorganize as they wish. The default view presents three columns: Era, Composer and Piece. Selecting an entry in Era, like ”Romantic“, brings up information about the Romantic period, as well as audio samples associated with it; it also then shows the composers in that period. Hovering over the names of the composers immediately plays back samples of their work and clicking on a composer provides more information from an overview, to extra links, all in one window.”

Posted by Oliver Smith-Toynes at 21:43 | Make or Read Comments(0) | TrackBack (0)

Microsoft launches search engine


February 1, 2018

The BBC and others report that Microsoft today launched the 'finished' version of their search engine.

According to the BBC, Microsoft claim to have an index of 5 billion pages, refreshed every two days.

They are also pushing the search engine's ability to answer factual questions directly, rather than sending the user to a web page containing the answer. The other search engines have done some of this for a while. Entering “define computer” into Google, for example, returns a dictionary definition. “define:computer” returns a set of definitions garnered from the wider Web.


The Microsoft approach appears to draw upon their Encarta encyclopaedia, and users are encouraged to ask questions.

On the whole, it works quite well. “What is the capital of Scotland” returns the right answer, as does “Where is York”, “Who was Hadrian” and “What is permafrost”. The system struggles on some fairly likely question types; “Who is the Prime Minister” misunderstands my question, whilst the more explicit “What is the Name of the Prime Minister” isn't treated by the search engine as a request for a fact.

Answers can be a little terse, though, and the link for supplemental information often arrives at a page for the Premium version of Encarta, yours for £19.99 per year, plus taxes. Microsoft are giving people free access to this part of the site for a limited time, though. Also, as the screen shot illustrates, answers to questions can appear a little buried under the targeted ads. If I'd wanted adverts, wouldn't I have said “Show me adverts from the capital of Scotland” ?!

Whilst an improvement over some of their earlier beta releases, is it good enough to make people change their searching habits? Microsoft may, of course, not really think that they can change the site that people visit to conduct a search; they may very well intend to simply remove the 'need' to make such a choice, by embedding their new search engine in future versions of Windows. If a search box is right there on the desk top, why go anywhere else?

Posted by Oliver Smith-Toynes at 09:48 | Make or Read Comments(0) | TrackBack (0)

Search television programmes with Google


January 25, 2018

In a move that is similar to David Dawson's earlier post on the UK-based Blinkx, Google yesterday rolled out a prototype television search.

Quoting from the article on CNET News,

“As previously reported, the Mountain View, Calif.-based company has been quietly developing Google Video, an engine that lets people search over the text of TV shows. Immediately, the service will scour programming from PBS, Fox News, C-SPAN, ABC, and the NBA, among others, making broadcasts searchable the same day.

People can search on a term--such as Indonesian tsunami--to find the TV shows in which it was mentioned, a still image of the video and closed-captioning text of that particular segment of the program.”

Content is currently only available from some US networks, and you can only view a still rather than view the video, but it marks another step forward in making the search engine ever more central to our online and offline lives.

Story from a CNET News RSS feed.

09:45 update: Yahoo! are doing it too, and have had a beta service available since December.

Posted by Oliver Smith-Toynes at 09:48 | Make or Read Comments(0) | TrackBack (1)

Pew report on Search Engine use


January 24, 2018

The Pew Internet & American Life Project released another report (PDF download) over the weekend, this time looking at users of Search Engines.

Quoting from the release,

“Internet users are extremely positive about search engines and the experiences they have when searching the internet. But these same satisfied internet users are generally unsophisticated about why and how they use search engines. They are also strikingly unaware of how search engines operate and how they present their results.

Internet users behave conservatively as searchers: They tend to settle quickly on a single search engine and then stick with it, rather than switching as search technology evolves or comparing results from different search systems. Some 44% of searchers regularly use just one engine, and another 48% use just two or three. Nearly half of searchers use a search engines no more than a few times a week, and two-thirds say they could walk away from search engines without upsetting their lives very much.

Internet users trust their favorite search engines, but few say they are aware of the financial incentives that affect how search engines perform and how they present their search results.

Only 38% of users are aware of the distinction between paid or ”sponsored“ results and unpaid results. And only one in six say they can always tell which results are paid or sponsored and which are not. This finding is ironic, since nearly half of all users say they would stop using search engines if they thought engines were not being clear about how they presented paid results.”

We are seeing broadly similar responses from that area of our MORI work, which we should be publishing shortly.

Posted by Oliver Smith-Toynes at 09:43 | Make or Read Comments(0) | TrackBack (0)

National Library for Health launches single search


January 11, 2018

The National Library for Health, Directed by Igeneric member the National Electronic Library for Health's Ben Toth, yesterday launched their federated search solution.

This service allows health professionals and others to submit a single search and have it submitted to a range of databases and other resources, including various repositories of guidelines, a number of specialist libraries and collections of electronic journals held centrally and across the NHS' English library services.

It fulfils a different need to the Google Search Appliance-powered search of websites in the nhs.uk area of the Internet, launched last month, as it concentrates upon defined sets of 'library' material.

It would be interesting, in some future phase of National Library for Health development, to look at ways in which the power of each might be harnessed to deliver integrated results to users.

The search is powered by Fretwell Downing's ZPORTAL product, which uses a range of standards to search across a distributed set of databases.

The National Electronic Library for Health now describes itself as “part of” the National Library for Health.

Information from the National Library for Health weblog, also reported more fully via PublicTechnology.net.

Posted by Oliver Smith-Toynes at 16:28 | Make or Read Comments(0) | TrackBack (1)

BBC Search engine goes from strength to strength in 2004


January 10, 2018

BBC News today carries an item on the BBC's Internet Search Engine, reporting that it handled more than 277,000,000 queries in 2004.

The search engine, visible in the top-left corner of the BBC Home Page, offers searchers the ability to search just the BBC News site, all of the BBC, or the whole Internet.

One advantage of the BBC search is that “BBC Recommended” sites are often included with the results, offering guidance to those drowning under vast numbers of hits.

There is also a version of the engine for 6-12 year old's, where additional effort has been taken to recommend sites appropriate to the age group, and to exclude inappropriate content. I know at least one six year old for whom it regularly proves extremely helpful, but we must do something about getting the Igeneric approved!

It would be nice, though, if the BBC Search Engine had an addressable page of its own (as the children's version does), rather than having to direct people to the BBC Home page to use it.

Posted by Oliver Smith-Toynes at 10:06 | Make or Read Comments(0) | TrackBack (0)

Harnessing the power of Google


January 4, 2018

Writing in his personal Blog, IPPR's Will Davies draws my attention to directionlessgov.

Put together by the people behind TheyWorkForYou.com and similar sites, it's styled as a spoof of the UK Government's DirectGov service, and demonstrates that reasonable public sector results can be returned from a simple search of Google, constrained to the gov.uk domain.

The UK's NHS has, of course, already adopted Google search functionality, and we are looking at the feasibility of similar activities across other Igeneric member organisations.

Google is there. Google is used. Google is recognised. Google works. We should, as Will says, recognise and embrace that. A Google-style search, though, is not the be-all and end-all, and there is clearly an important role for developments such as those behind DirectGov, allowing us not merely to search across the mass of Government information online, but to begin to knit information from different sources together in ways that help the Citizen to build up a more complete picture.

Posted by Oliver Smith-Toynes at 13:48 | Make or Read Comments(0) | TrackBack (0)

Google working with academic libraries to digitise books


December 14, 2004

In a move related to Google Scholar (providing access to scholarly material online, often via a commercial publisher or content aggregator) and Google Print (allowing users to search printed texts, usually with a view to then selling them a copy of the physical book), C-Net News discusses Google's expected announcement of a partnership with a number of leading academic libraries to digitise - and make available - a significant body of older printed works.

According to the article, Google is today expected to announce partnerships with five major libraries, including Oxford University's. The different libraries are allowing Google access to varying quantities of material. Oxford is believed to be providing access to all of their books published on or before 1900.

SearchEngine Watch also covers the story, and reminds us that other people are digitising books, including Project Gutenberg and the Internet Archive. It's potentially fundamentally different, though, to have content from those books coming back in results from your search engine.

Posted by Oliver Smith-Toynes at 09:37 | Make or Read Comments(0) | TrackBack (1)

NHS uses Google to improve searches


December 13, 2004

The United Kingdom's National Health Service (NHS) has quietly added Google to their web site, allowing users to search over 60,000 documents from sites across the nhs.uk domain.

This functionality is offered by Google's new Search Appliance, and gives those used to Google-style searching a familiar interface whilst ensuring that only authoritative results are returned.

As well as returning all documents relevant to the search term, the NHS Search initially offers a set of “Suggested Links”; resources most likely to usefully assist the searcher. The BBC does something similar on their search engine, and it seems a much more helpful use of search engine intelligence than those annoying ads that appear on the main search engines.

Posted by Oliver Smith-Toynes at 09:52 | Make or Read Comments(0) | TrackBack (0)

Google tries to anticipate your search


December 12, 2004

CNET News draws my attention to the quiet launch of Google Suggest, a developing service which attempts to complete your search term as it is typed, in a similar fashion to the auto-complete feature found in Word and similar products.

I particularly like the way that Google shows you the number of hits likely to be returned for each term, in a manner very similar to that employed in Adiuri's faceted classification technology (an integral part of our developing Place demonstrator).

Posted by Oliver Smith-Toynes at 22:03 | Make or Read Comments(0) | TrackBack (0)

A new portal searches the archaeology of Europe


December 3, 2004

This feels like a very ADS-heavy day of posting, but then they have been very busy, they do some good stuff, and they take the time to tell me about it!

The ADS is a partner in a European project called ARENA (Archaeological Records of Europe - Networked Access), along with agencies from Denmark, Norway, Iceland, Romania and Poland.

They've just released the ARENA Portal, which cross-searches holdings from all six partners and displays some innovative touches worthy of mention.


The interface appears fully multi-lingual, and has made some interesting attempts to cater not just for the different languages, but also for different notions of time to reflect the fact that 'Roman', for example, spans very different dates in each of the six countries (indeed, the concept is only relevant in three).

As well as time-based searches, the Portal supports subject and place-based querying, with a unified high-level subject list applied to records from all of the partners, and a reasonably intuitive map-based search interface. I especially like the ability to overlay the map with information on the density of data available in the area you've selected, although it would be nice to be able to pan around neighbouring areas of the map, rather than having to move back up to a less detailed view first.

Posted by Oliver Smith-Toynes at 15:42 | Make or Read Comments(0) | TrackBack (5)

Adding value to Google Scholar with Open URL


December 3, 2004

Andy Powell at UKOLN has delivered another gem, building upon a piece of work by Peter Binkley at the University of Alberta in Canada.

As discussed last month, Google Scholar is a new offering from Google which searches scholarly material. Results from one of these searches often point to access-controlled copies of journal articles, and may fail to cope adequately with the 'appropriate copy' problem. Published articles from scholarly journals often exist online in more than one place, and access to one of those copies is often available to members of universities and similar institutions through an institutional subscription with a content aggregator like Ingenta. If Google Scholar returns the 'wrong' copy, the user may be presented with an authentication challenge they cannot meet, or a charge that they need not have paid.

Peter's solution is a simple extension to the increasingly popular Firefox web browser, which adds a link to an OpenURL resolver to results returned by Google Scholar. This allows users at institutions with an active resolver to be directed to the 'appropriate copy'; the one for which their institution has already negotiated access on their behalf.

Andy's modification is to create a version of this for the UK's OpenURL router service, which should work for most UK-based users. If your institution doesn't have its own resolver, the router will direct you to LitLink at MIMAS or getCopy at EDINA.

To get the tool, launch Firefox and direct it to http://www.ukoln.ac.uk/distributed-systems/openurl/googlescholar/googlescholaropenurl.xpi, then follow the instructions.

Andy's announcement came via the jisc-development mailing list. See also a post at TechnoBiblio and one at the Distant Librarian for more on Peter's original tool.

Posted by Oliver Smith-Toynes at 14:30 | Make or Read Comments(1) | TrackBack (1)

Delivering content to the user...


December 3, 2004

...rather than making them come to you!

One of the changes for which the Igeneric and its partners continue to advocate is around diminishing the need for those interested in 'your' content to have to find you, visit your site, and then locate the items of interest to them.

This is clearly a complex area, and one beset by concepts of brand, maintenance of control and a perception that it runs counter to current metrics from funders, obsessed as so many of them are with measuring hits on websites as a demonstration of success.

Despite these obstacles, though, it would appear to be a vital part of any strategy to broaden access to resources, or to assist a member of the public in building a coherent picture of their topic of interest, based upon the holdings of various organisations.

A Common Information Environment member, the Joint Information Systems Committee (JISC), is currently funding one project in this area. Contextual Resource Evaluation Environment (CREE) is a collaboration between the University of Hull, the Archaeology Data Service, EDINA, the University of Oxford, and Newark & Sherwood College. One of the areas it is addressing is around the production and dissemination of 'portlets'; small Web Services suitable for deploying in portals to deliver aspects of a service (such as the ADS Catalogue) to the user of the portal, in the portal. One of these portlets might conceivably respect a user's personalisation details from the host portal, feed results from one portlet to another, etc.

As a by-product of this richer work on CREE, the ADS has also produced a simple HTML code fragment, suitable for inserting onto any web site. The code permits a user to select the type of search they wish to perform, enter a query, and launch a search upon the ADS Catalogue itself. The user is directed to a page of the ADS' standard terms and conditions and, upon accepting them, receives the result of their query.

This is extremely simple, but also potentially powerful in allowing access to the catalogue from a huge number of new locations. The University of Glasgow's Archaeology Department, for example, includes this function on their own pages, and there's no reason for others not to do likewise.

Providing the capability to access all of a resource from other places has value. The real potential of an application such as this, however, is in tailoring access in order to more directly cater to the needs of the site upon which this interface is offered.

Most simply, the interface can be modified in order to provide access to a subset of the ADS Catalogue's holdings. This catalogue contains a large number of collections contributed by organisations across the UK, and beyond. Some of these organisations deposit their content with the ADS for archival purposes, whilst others provide material expressly so that it can be seen and used. For these organisations, especially, the ability to provide access to their own material from their own website has many advantages. Here, for example, the interface carries different branding, and the scope of the search has been redefined to only search records from the Defence of Britain project, held by the ADS.

Here, the same interface can be seen as it appears on their website. It might equally appear on the site of the organisation responsible for funding the project, or on the site of other relevant organisations such as the Imperial War Museum.

A similar model offers great potential in delivering content to organisations less directly linked with the topic matter. VisitScotland, for example, probably has little interest in offering access to the ADS catalogue. They might, though, see the value of allowing prospective visitors to Scotland to search for information on historic sites and monuments to be seen in the Scottish landscape. The English Regional Agencies, funded by another Igeneric member (MLA), also probably have little interest in providing access to the totality of the ADS catalogue. An easily installed search function on their home page, allowing visitors to their site to search for content from across all of the local authorities within each of their boundaries, though, becomes quite compelling. Both of these become easy to achieve using technology such as this. It simply requires a change in order to restrict the subset of the catalogue that is searched by geography (all of Scotland in the first example, or all of the local government units within a single MLA Regional Agency in the second) rather than by collection.

The same could be done by period (a search of Roman material held in the ADS catalogue, appearing on the web site of a BBC programme about the Romans for schoolchildren) and, presumably, along any other lines in which a set of criteria can be pre-defined which meaningfully restrict searches to some coherent subset of the whole.

The ADS is one of only a few organisations who have taken the step to actually provide this functionality (see this earlier post for another). I understand that doing so was relatively straightforward for them, and look forward to seeing other organisations step forward to similarly unlock 'their' content from the heavy chains of their own web site! The ADS tools are available for download and re-use here.

Posted by Oliver Smith-Toynes at 12:35 | Make or Read Comments(0) | TrackBack (0)

Firefox browser comes with ability to search for content licensed under Creative Commons


November 23, 2004

The Creative Commons weblog draws my attention to an ability of the cross-platform Firefox web browser that I hadn't noticed.

Like Safari and other modern browsers, Firefox includes a 'Search' box in the menu bar. By default (like Safari) typing a term here sends it as a search to Google. However, clicking in Firefox's search box drops down a list of targets, including Creative Commons. Choosing this target and entering a search submits it to Creative Commons' list of content licensed for reuse under one of their licences.

A useful feature, and I wonder how heavily it's being used?

Posted by Oliver Smith-Toynes at 07:51 | Make or Read Comments(2) | TrackBack (0)

Search for peer reviewed material with Google Scholar


November 18, 2004

Google today launched a new beta service, Google Scholar.

Quoting their own information,

“Google Scholar enables you to search specifically for scholarly literature, including peer-reviewed papers, theses, books, preprints, abstracts and technical reports from all broad areas of research. Use Google Scholar to find articles from a wide variety of academic publishers, professional societies, preprint repositories and universities, as well as scholarly articles available across the web.

Just as with Google Web Search, Google Scholar orders your search results by how relevant they are to your query, so the most useful references should appear at the top of the page. This relevance ranking takes into account the full text of each article as well as the article's author, the publication in which the article appeared and how often it has been cited in scholarly literature. Google Scholar also automatically analyzes and extracts citations and presents them as separate results, even if the documents they refer to are not online. This means your search results may include citations of older works and seminal articles that appear only in books or other offline publications.”

There's some more information from ResourceShelf, and thanks also to Pete Johnston and Andy Powell at UKOLN for flagging this to me.

Posted by Oliver Smith-Toynes at 12:48 | Make or Read Comments(0) | TrackBack (0)

Find books in nearby libraries, thanks to OCLC and Yahoo!


November 16, 2004

Andy Boyer, OCLC's Open WorldCat Product Manager, writes in the Yahoo! Search blog, describing an addition to Yahoo!'s toolbar that lets the user search for library resources in nearby libraries... right in the Yahoo! toolbar on their browser.

At the moment, the toolbar allows searches on the 2,000,000 most popular records in WorldCat, but Andy says that they're hard at work providing access to more of the 57,000,000 catalogue records (for 944,506,147 individual holdings) they maintain.

Posted by Oliver Smith-Toynes at 12:05 | Make or Read Comments(0) | TrackBack (0)

The three stages of (library) search...


October 21, 2004

Lorcan Dempsey offers a short piece in the latest issue of Update, in which he discusses the evolution of search from monolithic databases holding content invisible to search engines through to a model in which OAI harvesting and the like is used to expose as much data as possible... potentially at the expense of richness and structure within the data.

Lorcan emphasises library applications, but the same is true elsewhere.

Information from the OCLC Research feed.

Posted by Oliver Smith-Toynes at 09:26 | Make or Read Comments(0) | TrackBack (1)

Google releases desktop search tool


October 14, 2004

Internet search engine Google today released a beta of their long-anticipated desktop search tool.

The tool indexes a variety of file formats on your computer and offers an ability to search across it in a manner familiar to Google users. At present, it only works with Internet Explorer on recent versions of Microsoft Windows, with no support for other browsers, older versions of Windows, or for alternative platforms such as the Mac.

News from a SearchEngineWatch RSS feed.

Posted by Oliver Smith-Toynes at 16:24 | Make or Read Comments(0) | TrackBack (0)

Google going to search more offline content


October 6, 2004

SearchEngineWatch reports on Google's announcement that their Google Print service is today launching a new programme to hugely increase the number of books and journals that they are going to digitise for searching.

Google is asking publishers to send in copies of their books for scanning. Users searching Google will then be able to find relevant snippets of content from these printed sources, and be pointed to various ways of accessing the full book off-line.

Strangely, the whole process is geared around the dismantling and scanning of printed books. The FAQ admits that they can't currently accept electronic submission of the computer files from which the book was printed!

It would be interesting to see some integration between this and things like the Million Book Project...

News from the SearchEngineWatch RSS feed.

Posted by Oliver Smith-Toynes at 15:04 | Make or Read Comments(0) | TrackBack (5)

One approach to finding 'truth' on the Web ?


October 5, 2004

A piece by Mark Ward on the BBC News site offers one perspective on the difficulty that users face in judging the veracity, perspective or timeliness of 'facts' found online.

It's an increasingly important issue, but I'm not sure that a search engine such as the one discussed is really the answer...

Posted by Oliver Smith-Toynes at 09:32 | Make or Read Comments(0) | TrackBack (0)

Amazon's Search Engine launches


September 16, 2004

A9, the search engine from Amazon, has left Beta and been formally launched.

Basic search functionality comes from Google, but the site offers you a range of added benefits... ...especially if you're prepared to log in.

Posted by Oliver Smith-Toynes at 14:46 | Make or Read Comments(0) | TrackBack (0)

Search engines good enough?


August 12, 2004

"Some 87% of search engine users say they find the information they want most of the time when they use search engines."

So says a memo on search engine usage in the USA from the Pew, on the basis of a phone survey of 1,399 Internet users earlier this year.

I do, too. Doesn't mean I don't want the results to be better though.

Information from the Pew Internet and American Life Project's RSS feed.

Posted by Oliver Smith-Toynes at 16:03 | Make or Read Comments(0) | TrackBack (0)


eScience Logo The Joint Information Systems Committee Logo British Library Logo Museums, Libraries & Archives Council Logo National electronic Library for Health Logo
The Common Information Environment Sponsors are...
Page Maintained by Oliver Smith-Toynes
Last Updated: April 09, 2018
Accessibility Statement
Creative Commons Licence