Back at uni, we have
frequently heard of some mythical creatures. The eagle eye has often
spotted what the next step would be. ("Bystré oko snadno nahlédne"
was the expression in Czech.) There also was the clever quantum mechanic ("zručný kvantový mechanik") who used
the postulates of quantum mechanics and the knowledge of special functions to
make headway. While the appearance of the two entities was welcome at times, as
it broke the flow of mathematical operations, they often left us wishing we
could manage it so easily too.
One such entity made
its appearance only once. First, we were reminded about the basics of Riemann
integration
– cut the integration interval into sub-intervals, take the maximum of the
function on each of them multiplied by the length of the corresponding sub-interval,
add them together. Then send the number of sub-intervals to infinity and
evaluate the limit.
Fig. 1: The main idea of Riemann
integration. Picture by Lucas Vieira, Public domain, via Wikimedia Commons.
A simile followed: This
is the same approach as the one used by a cashier. When making the daily total,
they take value of take each coin and
add it to the total. Common
sense, right? A revelation followed, though. A smart cashier (šikovný
pokladník) does it in a different way. First, he counts all the coins of a
particular denomination and create a subtotal by multiplying the number of
coins by the denomination. In the end, he adds the subtotals. Clever indeed,
right? If you are in education or mathematics, you might guess what followed. The smart cashier's approach was used to approximate the area under the curve in Fig. 1.
This was our introduction to the Lebesgue integration by our Calculus lecturer,
Mirko Rokyta.
Fig. 2: Picture by User:Svebert, CC0, via Wikimedia Commons. The main idea of Lebesgue integration is to look
“horizontally” and count “how often” the function reaches a certain
value. The resulting construction is mathematically very powerful.
I have not heard of the smart
cashier since Fall of 2001.
A month or so ago, I came across
a challenge in my work. I had a dataset with colours reported by users and I wanted
to analyse if the colours were happy, neutral or sad. We were interested in the aggregate value of happiness, as well as partial sums per sets.
set 1 | set 2 | set 3 | set 4 |
magenta/fuchsia | burgundy | emerald | violet |
chocolate | orange | copper | brick red |
lemon | cerulean | baby blue | indigo |
navy blue |
| electric blue | byzantium |
|
| baby blue |
|
|
| brick red |
|
My original approach was straightforward: go through ever entry, assign it a value (1, 0 or -1) and process
these at the end. Just for fun, try doing so. My main difficulty at this point was
not the number of entries but the need to make a choice – often in a situation
where I had no strong opinion. Is brick red (left) a happy colour or is it neutral?
What about byzantium (right) – is the “dark enough” to be considered sad? Or is it regally happy? When
faced with the task, I did what seemed the best: I asked a colleague to rate as
well, and then we reconciled the differences. A few weeks later, an onerous
task became a seemingly insurmountable one:
set 1 | set 2 | set 3 | set 4 |
copper | jungle green | harlequin | ivory |
electric blue | green | lavender | maroon |
harlequin | magenta/fuchsia | magenta rose | magenta rose |
indigo | jungle green | emerald | gray |
mauve | indigo | orange | lemon |
lemon | green | orchid | navy blue |
erin | lemon | pink | lavender |
maroon | magenta rose | navy blue | peach |
jade | ivory | cyan/aqua | puce |
ivory | magenta/fuchsia | olive | indigo |
navy blue | gray | orange | ivory |
electric blue | jungle green | mauve | electric blue |
magenta rose | gray | emerald | indigo |
lime | indigo | magenta rose | harlequin |
ivory | navy blue | magenta rose | cyan/aqua |
olive | ochre | lemon | green |
navy blue | jade | jade |
|
orange | copper | harlequin |
|
lavender | erin | magenta/fuchsia |
|
ivory |
| green |
|
lilac |
| maroon |
|
magenta rose |
| jungle green |
|
magenta/fuchsia |
| green |
|
lilac |
| ivory |
|
green |
| olive |
|
magenta/fuchsia |
| green |
|
mauve |
|
|
|
mauve |
|
|
|
jade |
|
|
|
electric blue |
|
|
|
mauve |
|
|
|
Short
of employing an algorithmic approach (or paying someone to do it), the only option
seems to be going entry by entry. Entry by entry..., like Riemann. Riemann is like a cashier..., oh, what would the clever cashier do? I do not know yet, but let’s see. Wait, if you order
the first column, you’ll have four instances of mauve, three of electric
blue and three of ivory. So you can count the mauves in the first column,
assign it a value (1?) and proceed. Soon afterwards I realise that ivory and
mauve are also present in the other columns. Now the solution is clear: you only
need to rate the unique entries! Lookups and sums will take care of the rest. In
the end, the work needed was reduced by a factor of 3. (That’s a big deal.)
The Clever Cashier emerges
victorious. More importantly, the episode shows how a simple beautiful concept
can be reused in a different setting decades after being learnt. Allow me to
end with a big appreciation to Mirko
Rokyta for teaching in this way.