XML World-Countries Query Exercises


In these exercises (and the companion data exercises), you are working with a sample data set about world countries, downloadable here as countries.xml. This data is adapted from the Mondial 3.0 database as hosted by the University of Washington, and was originally compiled by the Georg-August-Universität Göttingen Institute for Informatics. Each country has a name, population, and area (in km²). Some countries also list languages (with percentages of the population that speaks each language) and/or cities (with names and populations).

For more information on how to test your XPath, XQuery, and XSLT solutions, please see our quick guide to XML validation and querying.


XPath

1.  Return the area of Mongolia.

2.  Return the names of all countries with population greater than 100 million.

3.  Return the names of all countries where over 50% of the population speaks German. (Hint: Depending on your solution, you may want to use ".", which refers to the "current element" within an XPath expression.)

4.  Return the names of all countries where a city in that country contains more than one-third of the country's population.

5.  Return the population density of Qatar. Note: Since the "/" operator has its own meaning in XPath and XQuery, the division operator is infix div. So to compute population density use "(@population div @area)".

6.  Return the names of all countries whose population is less than one thousandth that of some city (in any country).

7.  Return the names of all cities that have the same name as the country in which they are located.

8.  Return all city names that appear more than once, i.e., there is more than one city with that name. You may return a city name multiple times if it simplifies your query. (Hint: You might want to use the "preceding" and/or "following" navigation axes for this query, which were not covered in the video or our demo script; they match any preceding or following node, not just siblings.)

9.  Return the names of all countries containing a city such that some other country has a city of the same name.

The following four queries are four different variations on the same theme.

10.  Return the names of all countries whose name textually contains a language spoken in that country. For instance, Uzbek is spoken in Uzbekistan, so return Uzbekistan. (Hint: As in question 3, you may want to use ".", which refers to the "current element" within an XPath expression.)

11.  Return the names of all countries in which people speak a language whose name textually contains the name of the country. For instance, Japanese is spoken in Japan, so return Japan.

12.  Return all languages spoken in a country whose name textually contains the language name. For instance, German is spoken in Germany, so return German. (Hint: Depending on your solution, may want to use data(.), which returns the text value of the "current element" within an XPath expression.)

13.  Return all languages whose name textually contains the name of a country in which the language is spoken. For instance, Icelandic is spoken in Iceland, so return Icelandic.

14.  Return the number of countries where Russian is spoken.

15.  Return the names of all countries that have at least three cities with population greater than 3 million.

16.  Return the names of all countries for which the data does not include any languages or cities, but the country has more than 10 million people.


XQuery

17.  Return the name of the country with the highest population. (Hint: You may need to explicitly cast population numbers as integers with xs:int() to get the correct answer.)

18.  Return the name of the country that has the city with the highest population.

19.  Return the average population of Russian-speaking countries. (Note: You might choose to write this query in pure XPath.)

20.  Return the average number of languages spoken in countries where Russian is spoken. (Note: You might choose to write this query in pure XPath.)

21.  Return all country-language pairs where the language is spoken in the country and the name of the country textually contains the language name. Return each pair as a country element with language attribute, e.g., <country language="French">French Guiana</country>

22.  Create a list of French-speaking and German-speaking countries. The result should take the form:

<result>
<French>
<country>country-name</country>
<country>country-name</country>
...
</French>
<German>
<country>country-name</country>
<country>country-name</country>
...
</German>
</result>

23.  Return all countries that have at least one city with population greater than 7 million. For each one, return the country name along with the cities greater than 7 million, in the format:

<country name="country-name">
<big>city-name</big>
<big>city-name</big>
...
</country>

24.  Return all countries where at least one language is listed, but the total percentage for all listed languages is less than 90%. Return the country element with its name attribute and its language subelements, but no other attributes or subelements.

25.  Return all countries where at least one language is listed, and every listed language is spoken by less than 20% of the population. Return the country element with its name attribute and its language subelements, but no other attributes or subelements.

26.  Find all situations where one country's most popular language is another country's least popular, and both countries list more than one language. (Hint: You may need to explicitly cast percentages as floating-point numbers with xs:float() to get the correct answer.) Return the name of the language and the two countries, each in the format:

<LangPair language="lang-name">
<MostPopular>country-name</MostPopular>
<LeastPopular>country-name</LeastPopular>
</LangPair>

27.  Return the countries with the highest and lowest population densities. You can assume densities are unique. (As in problem 5, to compute population density use "(@population div @area)".) The result should take the form:

<result>
<highest density="value">country-name</highest>
<lowest density="value">country-name</lowest>
</result>

28.  For each language spoken in one or more countries, create a "language" element with a "name" attribute and one "country" subelement for each country in which the language is spoken. The "country" subelements should have two attributes: the country "name", and "speakers" containing the number of speakers of that language (based on language percentage and the country's population). Order the result by language name. For example your result might look like:

<languages>
  ...
  <language name="Arabic">
    <country name="Iran" speakers="660942"/>
    <country name="Saudi Arabia" speakers="19409058"/>
    <country name="Yemen" speakers="13483178"/>
  </language>
  ...
</languages>


XSLT

Write each of the following queries using XSLT.

29.  Return a list of country names (just the names, no XML structure).

30.  Return all countries with population between 9 and 10 million. Retain the structure of country elements from the original data.

31.  Remove from the data all countries with area greater than 40,000 and all countries with no cities listed. Otherwise the structure of the data should be the same.

32.  Create an alternate version of the countries database: for each country, include its name and population as sublements, and the number of languages and number of cities as attributes (called "languages" and "cities" respectively).

33.  Create a table using HTML constructs that lists all countries that have more than 3 languages. Each row should contain the country name in bold, population, area, and number of languages. Sort the rows in descending order of number of languages. No header is needed for the table, but use <table border="1"> to make it format nicely, should you choose to check your result in a browser. (Hint: You may find the data-type and order attributes of <xsl:sort> to be useful.)

Challenge:  We created the countries.xml data used in these exercises by processing the University of Washington's mondial-3.0.xml data using XSLT. See if you can recreate what we did using one or more XSLT transformation specifications. Most of the omissions and changes from the complete data set to obtain our simplified data set are self-explanatory by examining the two files. Some notes: we ignored provinces but kept the cities in them; we discarded cities whose most recent population is less than 1 million; we kept only the first name listed for any cities with multiple names; we discarded repeated listings for cities in multiple provinces; and China has two populations in the original data's attribute value -- we edited this value by hand. You may want to use a service like XML Indent to improve the formatting of your result.