Left Shift is Multiplication

Left-shift operator is the same as multiplication. So, if I = 3, then I << 1 is 6. How does that work? Let’s see.

2^3 + 2^2 = 8 + 4 = 12

2^4 + 2^3 = 16 + 8 = 24

Indeed, it multiplied the original value by 2. What happened here ?

We’ll, let’s multiply the first line by 2:

2 * (2^3 + 2^2) = 2^4 + 2^3.

Now it is clear that left-shifting means you’re multiply by two. If you left shift again, you’re multiply the original by 2^2, and so on.

If you want to multiply by a number that is not a power of two, say 17, then it would be something like this:

2^4 + 1 = 16 + 1

So you left-shift 4 times, and add the original number.

N * (2^4 + 1) = N(16 + 1) = (N << 4) + N

Fascinating!

Converting Binary to Integer

I recently came across this example in my “Learning Python” book:

It converts a binary string to integer. The ord method is interesting, but I was interested in the algorithm itself. The algorithm says to look at the digits one by one, and each time through the loop, multiply the current value by 2 and add the next digit’s integer value. In this case, ‘1011’ would end up printing 13, which is correct. 13 in binary is 1011.

But then I struggled to think of how this would work. My naive solution would be to create a powers of two table, work my way from left to right, and get the highest value I can get without going over. For instance, 2^4 is 16, so I have to use 2^3, which is 8. Thus there is a bit in the 2^3 spot. Next I try 2^2, which is 4, and 8 + 4 is less than 13, so I can put a bit in the 2^2 spot. Then I try 2^1 which is 2, and 8 + 4 + 2 is 14, which is larger than 13, so there is a 0 in the 2^1 spot. And finally I need 2^0, as that is 1, and 8 + 4 + 1 = 13.

Since that was my solution, I would try to convert to code – because often times how you think about a solution is how you write it in code.

However, that was not what the book had. I now know what the book did.

Any integer can be represented in the form:

N is the integer, x_n is the number of digits depending on the base. So if it’s base 2 (binary), then it’s either 0 or 1. If it’s base 10, then it’s 0-9. q^n is going to be the base to the nth power. For binary, it’s going to be 2^n.

13 can be represented like this:

Now we can factor the 2 out, and keep factoring out the inner 2 until we can’t anymore:

We end up with the bottom form, which makes it easier to see why the original algorithm works. As a reminder, the original algorithm says multiply the current value by 2, and then add the next value. This is indeed what we do with (1), (2), and then (3).

So it all makes sense now! Yay!

Linked Lists to the Rescue – Part II – Queues

In Part I, we looked at an implementation of Stacks using Linked Lists. In this post, we’ll look at Linked Lists with Queues.

As a review of the Linked Lists of Stacks, here’s a drawing of the implementation of Stacks using the linked-list nodes:

stack

As you can see, the latest node points to the previous node, which points to the node previous to that, etc…

Whenever we add a new node to the Stack, we make sure we point it back to the previous node. Once we “pop” the stack, we return the value of the “current” node, and follow the “current” node’s pointer back to the previous node, which we now make as the new “current” node. The previous “current” node is now orphaned, and left for the garbage collector. The main thing to note is that the pointers are going backwards (towards the left), as we add more nodes to the Stack.

In my implementation of a Queue, the pointers are going to go in the opposite direction! Check it:

queue

The pointers are now going to the right. What’s going on ? In a queue, recall that it’s “FIFO”, or “First in, First Out”. We start out with a single node as our first item. When we add a node, the previous node points to our new node.

In order to “pop” (we call it “dequeue” in this case), we have to remember our very first node on the left, so we assign a variable to keep track of it. After that first node is returned, we follow it’s pointer to the next node, and make this next node the “first” node. The first node is then orphaned, and gets cleaned up from the garbage collector.

In addition to remembering our first node, we also have to keep track of our most recent node, in order to add the future nodes to it. Let’s talk with code now:

The “@q” variable is our “first” node, and the “@current” variable is the latest node.

To summarize, creating Stacks and Queues with linked lists has some advantages:

  • We don’t need to keep track of the size of the array, and the memory is dependent strictly on how big our list is
  • As another point related to size – the bigger the size, the longer it takes to do certain operations, such as resizing. For instance, downsizing a large array takes at least as many operations as the size of the array to resize, since we’re copying the values. In linked lists, the time is independent of the size
  • It’s easier to think about conceptually, then the resizing (more of an opinion)
  • Linked lists are so cool!

    Linked Lists to the Rescue – Part I – Stacks

    As one might have imagined from the many hints and questions from the previous post, linked lists can be very useful when dealing with Stacks and Queues.

    To define a linked list, let’s start with the basic element of a linked list, called a node. Here is my silly illustration of a node:

    Screen Shot 2016-07-11 at 8.38.42 PM

    The left box represents the “value” of the node, and the right box represent where to look for the next node. That’s it!

    You can put nodes in a list like this:

    Screen Shot 2016-07-11 at 8.38.46 PM

    The leftmost node has a value, and it also tell us where to go for the next node, which is the node in the center. The center node has a value and tells us to look at the right node next. Nodes in a list are called a “linked list.”

    Let’s see a node represented in Ruby!

    In this “node class”, there are two variables: “item” and “next”.

    The “Item” can hold anything you want: Strings, Numbers, Integers, Arrays, Hashes, other Ruby classes, etc… the “list” goes on (pardon my intended pun). The “next” is like a pointer, in that it shows us where to look for the “next” node.

    To see it in action, here is one implementation of a Stack using Linked Lists:

    We need our node class, so the first line requires it. Before saying anything, see how much shorter the code is already! This has to do with not needing to remember the array size. If all you need to remember is where to go next, you don’t need to remember to resize your data structure once you hit size limits. The Linked Lists shrinks and grows on command, and “orphaned” nodes are cleaned up by the garbage collector. It seems to be more pleasant to read (and implement) the stack class using linked lists.

    The stack is initialized by creating a single node. As we add items to the stack, we tell our new node to point to our previous node. Unlike the array implementation where we had to keep track of all elements in the stack, all we ever have to know at any point in time is our latest node when we use the linked list implementation. When we remove items, we follow our latest node to the previous node, and tell the previous node to become the “current” node.

    In Part II, let’s see how to implement Queues with Linked Lists!

    A World Without Linked Lists

    To get a sense of how important Linked Lists are, it might be useful to imagine a world without Linked Lists.

    Suppose we want to build a Stack. A Stack is a collection of objects in a sequence, where the last object added is also the first object removed. The common term is “Last in, first out”, or “LIFO.”

    A commonly used real-world example for a Stack is a stack of pancakes. When pancakes are placed on top of each other, the last pancake added is the one on the very top. This very top pancake is also the first to be eaten.

    How would we implement a Stack without using Linked Lists ?

    Let’s say we are allowed to use an Array to implement our Stack. How big should our array be at the start ?

    By its definition, a Stack implies that its size varies with the addition or removal of objects. What if our Stack outgrows our initial Array size ?

    If we are able to solve the problem of our Stack outgrowing our Array size, what then if our Stack eventually shrinks significantly ? Would we have hold a large number of empty elements in our Array ?

    With some thought, we are able to implement our Stack using just an Array. Here is one implementation:

    The main takeaway is that we double the array size whenever we reach our limit, and we shrink to a quarter whenever half of the array is empty (we shrink to a quarter instead of a half so that we aren’t teetering on the edge of doubling or shrinking the array with a single add or drop in the stack). In expanding or reducing the array, we make use of the “resize” method, which makes a copy of the entire array, temporarily doubling the space cost.

    Let’s look at another collection of objects – a queue. A Queue is “First in, first out”, or “FIFO”. A queue in the real world could be people lining up to see a movie. The first person in line is also the first person to get into the theater.

    We run across the same sizing questions as the Stack – and can handle them similarly.

    In the Stack implementation, the last element of the Array was always the element being popped, and the element being pushed was always added at the end of the Array. For a Queue, we need to keep the position of the “head” and “tail” of the queue. If the beginning element of the array is removed when we “dequeue”, what do we do with the empty elements at the beginning of the array as we continue to dequeue ?

    Here is one implementation of a Queue using Arrays:

    We have to be more careful with our tail and head position of the array. Whenever we resize our array or whenever our head position reaches our tail position, we shift our head position back to the beginning of the array. The length of our queue is determined by our head position less our tail position. After these differences, we handle our sizing decisions are similar to our Stack implementation.

    Boy – that’s a lot of resizing and keeping track of positions! A world without Linked Lists is tough!
    Can Linked Lists make things easier ? What are Linked Lists ?

    Connecting Storm and Kafka

    Previously, we set up Kafka and we’ve also set up Storm.

    Now, let’s connect the two!

    By now, we’ve assumed that you are able to get the ExclamationTopology running from our previous guide. In this guide, we are going to use Kafka as our Spout to the ExclamationTopology.

    I usually check out the binary version of Storm to run Storm itself, but I like to develop on the source version of Storm. To that end, let’s grab the source version of Storm and extract it:

    $ wget http://apache.mirrors.pair.com/storm/apache-storm-0.9.2-incubating/apache-storm-0.9.2-incubating-src.tar.gz

    $ tar -zvxf apache-storm-0.9.2-incubating-src.tar.gz

    You can see that a Kafka-connector project has already been written for us, in external/storm-kafka.

    To understand how to leverage this project, let’s make Kafka as our Spout for ExclamationTopology.

    First, copy the ExclamationTopology to the storm-kafka project, since that’s where all the KafkaSpout classes are (there are other ways of doing this, such as bundling storm-kakfa into a jar, and using these libraries instead).

    $ cp apache-storm-0.9.2-incubating-src/examples/storm-starter/src/jvm/storm/starter/ExclamationTopology.java apache-storm-0.9.2-incubating-src/external/storm-kafka/src/jvm/storm/kafka

    Now add the following imports into ExclamationTopology:

    import storm.kafka.SpoutConfig;
    import backtype.storm.spout.SchemeAsMultiScheme;

    It should look something like this:

    storm-kafka-import

    Remember also to change the package at the top of the import statements to storm.kaka (not storm.starter), as seen in the screenshot.

    Then use this to setup the Kafka Spout, assuming you have created a “test” topic as shown in our Kafka guide.

    String zkConnString = "ec2-54-183-196-65.us-west-1.compute.amazonaws.com";
    String topicName = "test";

    BrokerHosts hosts = new ZkHosts("localhost:2181");
    SpoutConfig spoutConfig = new SpoutConfig(hosts, "test", "/test", "discovery");
    spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
    KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);

    builder.setSpout("kafka", kafkaSpout);

    Here is a screenshot:

    storm-kafka-config

    Then add the following lines to the pom.xml file currently in that directory. These have to go within the “plugins” bracket:

    Also, remove the “scope” field in the org.apache.kafka POM dependency in the pm.xml file under storm-kaka.

    Then, compile the package, and submit it to a Storm production cluster.

    If you remember from our tutorial on Kafka, you can open up a Producer command line terminal and start typing messages into the Kafka. Go ahead and do this.

    On the other side, you should see Storm adding exclamations to everything you type.

    Congratulations, you have connected Storm with Kafka!

    Setting up Storm and Running Your First Topology

    This guide will setup Storm on a single Ubuntu instance, and show you how to run a simple Word Count Topology. This guide assumes no experience with Storm.

    Storm was created by Nathan Marz, and is designed for real-time event processing, and improves on some of Hadoop’s distributed design. Storm provides excellent documentation, which you are highly encouraged to go through. If you’re pressed for time, though, the following guide gets you started with running a simple real-time event processor (this is called a topology, but I assume you haven’t read any documentation and just want to get the thing up and running. Though this puts you at a conceptual disadvantage, there’s nothing like getting your hands dirty right away).

    Setting up Storm

    First grab version 0.9.2 of Storm (already compiled version)

    $ wget http://apache.mesi.com.ar/storm/apache-storm-0.9.2-incubating/apache-storm-0.9.2-incubating.tar.gz

    Extract the files:

    $ tar -zxvf apache-storm-0.9.2-incubating.tar.gz

    Grab the latest version of maven:

    $ sudo apt-get install maven

    If you are on a Mac:

    $ brew install maven

    also set the JAVA_HOME path in Mac through the ~/.bash_profile file:

    $ export JAVA_HOME=$(/usr/libexec/java_home)

    Check the maven version to see that it installed correctly:

    $ mvn -version

    mvn_version

    If you checked out the src version of Storm, you can build and install the Storm jars locally with the following command (requires pom.xml file). This doesn’t need to be done if you already downloaded the compiled version as this guide has shown. However, it’s worth noting now because you’ll be using this command to compile your projects if you modify any of the source code.

    Instead of building jars for the Storm project (since we’ve checked out the compiled version), let’s build the jar file for the storm-starter example project. First go into the storm-starter project within the apache-storm-0.9.2-incubating/examples folder:

    $ cd apache-storm-0.9.2-incubating/examples/storm-starter

    Now compile and build the jar files:

    $ mvn clean install -DskipTests=true

    It should take a few minutes, and you’ll see a lot of output. At the end of the output, you should see:

    mvn_success

    You are now ready to run your first Storm job (called a “topology”). We are going to run the topology located in apache-storm-0.9.2-incubating/examples/storm-starter/src/jvm/storm/starter/WordCountTopology.java

    Let’s run the topology first, and then go briefly into the details of what is happening.

    In the storm-starter directory, issue:

    $ mvn compile exec:java -Dstorm.topology=storm.starter.WordCountTopology

    The whole process will take about 50 seconds, and you will see a whole bunch of output. The main thing to look for in this output is something like this:

    wordcount_output

    It should occur near the middle of all the output being shown. The end of the output should a bunch of shutdown messages, along with a success message like this:

    wordcount_success

    Congratulations! You have ran your first Storm topology!

    Storm has a local mode (called “Local Cluster”) and a production-cluster mode. The local mode is helpful for development and debugging, while the production-cluster mode is intended to run in a production environment. You just submitted the WordCountTopology in local mode.

    Let’s take a look at what you just did.

    A Storm topology consists of “Spouts” and “Bolts”. You can think of Spouts as obtaining the data, and Bolts as transforming the data. In a topology, you typically have one or more Bolts stemming from one Spout. The “data” in a Storm topology is called a “Tuple”.

    bolts_spouts

    In the WordCountTopology, the Spout used is the RandomSentenceSpout:

    wc_randomsetence

    RandomSetenceSpout is located at apache-storm-0.9.2-incubating/examples/storm-starter/src/jvm/storm/starter/spout/RandomSentenceSpout.java

    If you take a peak at this file, you can see the sentences being used:

    randomsentence

    That explains our output in our example – the words being “emitted” are taken from these sentences. The nextTuple method is common in all Spouts, and determines what you want the Spout to do. As you can see, it is a method that is commonly overridden.

    Let’s now take a look at the Bolts in WordCountTopology.java:

    wc_bolts

    These bolts are methods defined with the same file (WordCountTopology.java). By their name, we can guess what they do. Let’s take a look at “SplitSentence”:

    split_sentence

    It looks like it is calling a python script called “splitsentence.py”. Doing a little digging, we find this script located in apache-storm-0.9.2-incubating/examples/storm-starter/multilang/resources/splitsentence.py

    We’ve just stumbled upon a cool thing that Storm can do – bolts are allowed to be language-agnostic! This means, we can write our logic in any language we please. In this case, we are splitting sentences with Python! Here is the splitsentence.py logic:

    splitsentence-py

    As you can see, it’s splitting the sentences by a single space, and “emitting” each word in that sentence.

    So our first bolt “SplitSentence” is actually a python script that splits the sentences into words. Let’s take a look at our second bolt, “WordCount”, which is defined in WordCountTopology.java:

    wordcount_java

    As you can see, a HashMap called “counts” is created, which stores the counts of each word going through.

    This is the basic and fundamental template of a Storm topology. All other topologies you see are just different variations on this.

    Just for completeness, let’s take a look at the rest of WordCountTopology:

    wc_config

    As you might guess based on the names of the variables, the rest of the file is used for configuration information.

    conf.setDebug controls the verbosity of the output. The block of code within the “if” statement is configuration for production, and the block of code in the “else” statement is for local mode (which is what we just saw). The topology being submitted is called “word-count”, and we’ve asked the job to run for 10 seconds.

    In the meantime, as a “homework” assignment, you are encouraged to get the ExplanationTopology.java working, located in examples/storm-starter/src/jvm/ExclamationTopology.java

    If you are feeling ambitious, try modifying the input Spout (TestWordSpout.java), and see how things change. However, you will need to download the source version and build storm-core from scratch, as TestWordSpout.java is part of storm-core. Remember to issue the compile command at the top storm level after each modification of the code:

    $ mvn clean install -DskipTests=true

    Deploying Storm on Production

    Package the project for use on a Storm cluster. For instance, in storm-starter, do:

    $ mvn package

    The package should be in:

    target/storm-starter-{version}-jar-with-dependences.jar

    You can check out the binary version of storm (as we did above), and use the “storm” command from there. You can also add the bin path to $PATH.

    Read this to fill in the storm.yam part.

    I only modified the following in storm.yam:

    storm.zookeeper.servers:
    – “localhost”
    # – “server2”
    nimbus.host: “localhost”

    Then start nimbus, supervisor, and UI:

    $ storm nimbus
    $ storm supervisor
    $ storm ui
    (localhost:8080 by default)

    Then, from the machine that has the storm-jar-dependencies.jar, submit it:

    $ storm jar /Users/haroldnguyen/workspace/storm-tutorial/apache-storm-0.9.2-incubating-src/examples/storm-starter/target/storm-starter-0.9.2-incubating-jar-with-dependencies.jar storm.starter.ExclamationTopology production-topology-1

    The logs are located in the binary version of storm/logs.

    The storm ui is pretty neat, giving you a summary of running topologies and visualization:

    If you still have trouble, please read through this excellent documentation on Running Topologies on a Production Cluster.

    Conclusion

    Congratulations – you went from knowing nothing about Storm to running a Word Count topology! The world is really your oyster!

    In our next post, we’ll see how to connect Storm with Kafka!

    Setting up Zookeeper and Kafka

    This was tested on an m1.medium instance on AWS on Ubuntu 12.04.

    Installing Zookeeper

    Zookeeper is needed for Kafka. To quote the Zookeeper Apache project page,

    Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.

    In other words, we need Zookeeper to maintain configuration for Kafka nodes in a multi-node environment. In this tutorial, we’ll only be setting up a single Kafka node instance, but Kafka relies on Zookeeper to for configuration information.

    We can start off by grabbing Zookeeper:

    $ wget http://mirrors.advancedhosters.com/apache/zookeeper/stable/zookeeper-3.4.6.tar.gz

    Extract the files:

    $ tar -zxvf zookeeper-3.4.6.tar.gz

    Change to the zookeeper/conf directory:

    $ cd zookeeper-3.4.6/conf

    In this directory, create the zoo.cfg file from the provided template configuration file named zoo_sample.cfg:

    $ cp zoo_sample.cfg zoo.cfg

    Open up zoo.cfg and change the dataDir variable to point to /var/lib/zookeeper. (I used the vi editor to make these changes).

    dataDir_var

    Since this directory doesn’t exist yet, create it:

    $ sudo mkdir /var/lib/zookeeper

    Install Java on a fresh Ubuntu 12.04:

    $ sudo apt-get update
    $ sudo apt-get install openjdk-7-jdk

    In the zookeeper-3.4.6 directory, start Zookeeper:

    $ sudo bin/zkServer.sh start

    You should see a message similar to:

    zoo_start_success

    You can test to see that it can be connected from Java:

    $ bin/zkCli.sh -server 127.0.0.1:2181

    zoo_connected

    Great! Now you have Zookeeper running.

    Installing Kafka

    We’ll setup using kafka_2.9, since at the time of this writing, Storm-0.9.2 (which we will be using) plays nicer with this version of Kafka.

    First grab Kafka, and then extract it:

    $ wget http://mirror.cogentco.com/pub/apache/kafka/0.8.1.1/kafka_2.9.2-0.8.1.1.tgz
    $ tar xvzf kafka_2.9.2-0.8.1.1.tgz

    Change to the kafka_2.9.2-0.8.1.1/config directory:

    $ cd kafka_2.9.2-0.8.1.1/config

    Here, change the log.dirs property in the server.properties file to /var/lib/kafka-logs:

    kafka_log_dir

    Create that directory:

    $ sudo mkdir /var/lib/kafka-logs

    In the server.properties file again, point the advertised.host.name property to the public DNS (the address that others can connect to) of your instance:

    advertised.host

    In the kafka_2.9.2-0.8.1.1 directory, you can start the Kafka server:

    $ sudo bin/kafka-server-start.sh config/server.properties &

    Or, start it as a daemon:

    $ sudo bin/kafka-server-start.sh -daemon config/server.properties

    You can stop the server anytime:

    $ sudo bin/kafka-server-stop.sh

    Create a “topic” named test with a single partition and one replica:

    $ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

    See it at the command line:

    $ bin/kafka-topics.sh --list --zookeeper localhost:2181

    list_topic_test

    You can also start inputting command-line message, and read the output in another terminal. First start the producer to the topic test:

    $ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

    Type a message and press ‘enter’. For instance, the message can be “hello”.

    producer

    Open another terminal, and in the kafka_2.9.2-0.8.1.1 directory, start the consumer:

    $ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning

    This will start consuming all messages since the beginning of the topic, but you can set the offset. Messages survive for 24 hours by default, but can be tuned to your desire.

    You should see each message given from the producer:

    consumer

    Talking to Kafka with Python

    Instead of the command line, maybe we’d like to push messages to Kafka programmatically, say, from a Python script.

    First, get ‘pip’ so that you can install Python packages.

    $ sudo apt-get install python-pip

    Now install the kaka-python package:

    $ sudo pip install kafka-python

    While leaving your consumer terminal open, run the following Python script:

    You should see the message show up in the consumer:

    consumer_from_python_producer

    You’ve now successfully pushed messages to Kafka using a Python script. There are many more things you can do with the kafka-python package, and you are encouraged to look through the documentation on the kafka-python package Github page.

    Summary

    In this tutorial, you set up Zookeeper, Kafka, and used a Python script to push messages to Kafka. If you’ve seen any errors in this tutorial, please feel free to leave comments in the section below.

    Binary Search

    Sometimes, an elegant algorithm in computer science is processed the
    same way a person would perform a task. For example, if you were looking
    up a name in a phone book, you wouldn’t start at the very beginning
    of the book. You would perhaps start somewhere in the middle, and divide by half
    each time until you zeroed in on the target.

    Maybe you know that the phone book is alphabetized, and if the name started with a “B,”
    then you might start towards the beginning of the book. However, imagine that you didn’t know
    how many names there were for each letter, and it’s possible that letters that
    start with “A” could take up a majority of the book. Here, you might guess to
    open the book up to about the center, and go from there. For the rest of
    this post, let us assume we don’t know the distribution of the numbers in an array,
    but only that it is sorted.

    When you looked up the numbers in the phone book, you are roughly doing a “binary search.”
    Here is a table that shows how a computer might process the steps:

    Binary search (credit: http://introcs.cs.princeton.edu/java)

    Binary search (credit: http://introcs.cs.princeton.edu/java)

    If we were to do a “brute-force” method, which is checking each number starting at
    the beginning, the process will take on the order of N/2 times on average, where N
    is the number elements in the array. If we guess at random until we find the target,
    it will also take on the order of N/2 times on average.

    However, binary search limits the set that we must search by half each time, so it
    will only take on the order of lg N on average. For illustration, if there were 10,000,000
    numbers, the brute-force method will take 5,000,000 steps on average, whereas binary search
    will take on average 23 steps!

    It is worth thinking about efficient algorithms, because even the best hardware will lose
    to a great algorithm.

    As always, let’s let the code do the talking!

    Euclid’s Algorithm

    Euclid was a very prolific mathematician, and it is probably a mistake to be surprised that his name resides in yet another algorithm – one used to determine the greatest common divisor. This is commonly known as Euclid’s Algorithm. This algorithm is very simple (only 2 steps), but it can seem subtle at first.

    The greatest common divisor (GCD) is the largest number that two integers can divide by. For instance, the greatest common divisor of 16 and 12 is 4. In this post, we consider only nonnegative integers.

    Consider the following illustration (left side only – the one labeled “Euclid’s example”), where the two black bars represent the integers for which we seek the greatest common divisor.

    Illustration of Euclid’s Algorithm

    The black bars are those labeled AB and CD. Suppose we craft the bars so that CF (blue) is in fact the greatest common divisor of AB and CD. Let AB be 10 times the length of CF, and CD be 7 times the length of CF.

    Suppose that the length of AB is larger than the length of CD. The trick in the algorithm is that the difference, or d, between AB and CD (AB – CD) will always be greater than the GCD, unless AB = CD (in which case AB or CD is the GCD).

    The reason why d will always be greater than the GCD is because both AB and CD must be a factor of GCD, and the difference between AB and CD must be a factor GCD. Therefore, GCD is a factor of d, so d must be greater than or equal to GCD.

    The steps to the algorithm are as follows, if AB and CD are the two nonnegative integers we wish to find the GCD for:

    Step 1: Let AB be the larger of the two numbers. Then set d = AB – CD

    Step 2: If d = 0, then CD is the GCF. If d is not 0, then repeat Step 1 with AB = CD, and CD = d

    However, the subtraction method can take many steps if the two values AB and CD are far apart. It has been common to reduce the number of steps by calculating the remainder between AB and CD. The same logic still applies, however, except we remove the many factors of CD in the difference d between the two numbers AB and CD for each loop.

    We can now let the code do the talking!