Search

Help with Searching


This full text search engine is designed to help you find information located within Environment Canada websites. Simply type the words or phrase that match the information you are searching for into the form to begin process. The search engine begins by referencing a 'search index' to identify documents containing the specific term(s). A search index is essentially a database of all EC's webpages and the associated metadata for these documents. A resulting list of web pages is displayed for the user's evaluation.

When a "full text" search of the department's web sites is started, the search engine looks for those terms whether the text that is visible within the body of the page, or stored "invisibly" within the underlying html code.

Search results are ranked based upon the number of times the search term appears in the document.

Operators

  • AND - Finds documents containing all of the specified words or phrases. Peanut AND butter finds documents with both the word peanut and the word butter. The words may or may not be together in the page.
  • OR - Finds documents containing at least one of the specified words or phrases. Peanut OR butter finds documents containing either peanut or butter. The found documents could contain both items, but not necessarily.
  • NOT - Excludes documents containing the specified word or phrase. Peanut NOT butter finds documents with peanut but not containing butter.

Language

  • English - Retrieves documents that are identified by the search index as being written in English.
  • French - Retrieves documents that are identified by the search index as being written in French.
  • Any Language - Disregards the language metadata associated with a document and retrieves the greatest number of records for the specified search terms. Since not all web documents may contain required 'language' metadata (dc.language), this option will maximize the number of results found. Searching by a specific language will return a much smaller set of results, but may eliminate some documents that contain the search term(s) but have not been classified with the corresponding language metadata. (See technical information about searching by language ...)

Technical Information

  • Harvest and Index - The search engine selects matching results from a search index of Environment Canada web sites. When a new website is created within the Web Solutions Framework, this search index is automatically updated in 'real-time'. When a non web solutions site is published, the website will be harvested on a regularly scheduled basis and included in the updated search index. The search index is a database of all Environment Canada web pages.
    • Robots.txt File - A 'robots.txt' file is a set of rules that instruct web crawlers to ignore certain parts of a website. A web crawler is an automated program sent out by a search engine, that accesses a web site and traverses it by following the hyperlinks present within its pages. The 'robots.txt' file can greatly improve the performance of the harvester and indexer since rules can be set to ignore irrelevant files. (For instance, the harvester would know that it is unnecessary to crawl through a directory containing only images). For an example of a robots.txt file, go to: http://www.ec.gc.ca/robots.txt
  • Language - The language of a webpage is identified by evaluating the value in its 'language' meta tag (ex. dc.language - <meta name="dc.language" scheme="ISO639-2" content="eng"/>). In cases where meta tags are not present, the occurence of 'special characters' on the page, are used to help identify the language of the document. This method may not always be correct. For example an English document may be incorrectly designated as French if it contains French special characters such as é-è-ê-à-ç etc. The reverse is also true where a French document that does not contain any French characters will be incorrectly designated as English. It is important that all Environment Canada web pages contain the correct metadata to ensure the clearest possible results returned by the search engine.
  • Ranking - Results appear in order of relevancy and date modified. The number of times the search words appear in the document increases its relevancy. Documents recently modified appear higher in a results listings.
  • Words - Whether a word or "exact phrase" is designated, the terms entered in the search form will appear somewhere in the web pages retrieved by the search engine. The search terms can be either visible within the body of the page, or invisible within the document's title or metadata (dc.title, dc.subject, dc.description, dc.creator, keywords, etc).
  • Case Sensitivity - The search engine is NOT case sensitive. If you search for a word in lower case or containing a capital letter, the engine will retrieve all documents containing this word regardless of any capitalization in its spelling. (ie. 'Department' finds 'Department', 'department', 'DEPARTMENT', 'DEpartMent', etc).
  • Special Characters - Following the same principle as case sensitivity, searching for a word without specifying any accents or special characters will find both versions, words without accents and words with accents. This is particularly useful when searching for documents in another language like in French. For example, a search on the term 'Économie' finds all 'Économie', 'économie', 'Economie', 'economie', etc.