The Smyth County Schools 2015-2016 Coding Challenge

Marion Senior High School students installing Linux on Beowulf cluster

The goal of the annual Coding Challenge is to increase students' interest in careers in computer science and other technical fields. These careers appeal to a growing number of Smyth County students, such as these members of the Marion Senior High School S.A.V.E. Club. S.A.V.E. Club members are shown here installing Linux on a group of surplus computers, which they are repurposing as a Beowulf cluster that will run the Hadoop "big data" processing software.

Quick Links

Standings
Batting Practice
First Inning
Second Inning
Third Inning
Fourth Inning
Fifth Inning
Sixth Inning
Seventh Inning
Eighth Inning
Ninth Inning

By Terry Hawthorne
Director of Technology

Last year, Smyth County Schools held a Coding Challenge to build students' interest in careers in computer science and programming. The 2014 Coding Challenge asked students to attempt level 10 of the Blockly Games maze puzzle. That Challenge was akin to running a mile in PE class. If you stuck with it long enough, you would eventually get to the end of the maze.

But now it's October and the World Series is coming up, so I've designed the 2015 Coding Challenge to be more like a baseball game than a mile run. Instead of a single problem to solve, there will be nine innings, each with its own problem, plus a batting practice challenge for warmup purposes. Each inning will consist of a computer science or technology topic and a related problem. You will score runs based on your ability to solve each inning's problem. Here's what you need to know about the 2015 Coding Challenge.

Batting Practice

Baseball players warm up with batting practice. This gives hitters a chance to swing at some easy pitches, before they face the opposing pitcher's 96-mph fastball. We are going to start with the Coding Challenge equivalent of batting practice. I'm going to toss you some easy pitches before you go up against the coding equivalent of Chicago Cubs pitcher Jake Arietta in the first inning. (Arietta won 22 games this season, while allowing only 1.77 earned runs per game. Both his fastball and sinker average 95 mph.)

The Setup: Teaching Computers to Write

Stockholm: The Swedish Academy announced today that the 2067 Nobel Prize in Literature is awarded to the iPhone 58z…

The first computers were people.

OK…imagining a smartphone as a Nobel laureate is a stretch, but there is no question that computers are taking over complex mental tasks that once only people could perform. This change is reflected in the evolution of the word computer. According to the Oxford English Dictionary, the first printed instance of the word computer appeared in 1613 in a publication entitled Yong Mans Gleanings; "I haue read the truest computer of Times, and the best Arithmetician that euer breathed, and he reduceth thy dayes into a short number."

Until the 20th Century, the world computer referred to a person who was employed in the task of performing mathematical calculations. With the development of mechanical and electronic devices that could outperform human computers, the meaning of the word changed to what we think of as a computer today. Now we have computers that can beat the world's best checkers, chess, or Jeopardy players. How about a computer that could replace sportswriters and other news reporters? In 2010, the New York Times filed a story about a startup company that had developed software that, given a stat sheet for an NCAA basketball game, could write a news story for the game. Other organizations are using computers to write weather reports. Can you tell which of the following pollen forecasts was written by a computer and which was written by a person?

But charts, graphs and rankings alone cannot replace words that tell a story. We humans love stories; a craving for narrative seems part of our nature.

—Randall Stross, When the Software Is the Sportswriter, New York Times, Nov. 27, 2010

Pollen counts are expected to remain high at level 6 over most of Scotland, and even level 7 in the south east. The only relief is in the Northern Isles and far northeast of mainland Scotland with medium levels of pollen count.

Grass pollen levels for Friday have increased from the moderate to high levels of yesterday with values of around 6 to 7 across most parts of the country. However, in Northern areas, pollen levels will be moderate with values of 4.

How much is 10,000 exabytes of data? If all of that data were transferred to a stack of new 128GB iPad Pro tablets, each of which is slightly more than 1/4-inch thick, the height of the stack would be considerably greater than the distance from the Earth to the moon, and the cost would far exceed the entire Gross Domestic Product of the United States. The iPad has a lot of uses, but massive data storage is not one of them.

Estimated global data in 2015, measured in gigabytes10,000,000,000,000
Estimated available storage capacity of 128GB iPad Pro in gigabytes114
Number of iPad Pros required to store the global data87,719,298,246
Thickness of an iPad Pro in inches0.27
Height of iPad Global Data Stack in miles373,804
Distance from earth to moon in miles238,900
Cost of the iPad Pro Global Data stack$83,245,614,035,454
USA's Gross Domestic Product 2014$17,419,000,000,000

These are examples of a technology called Natural Language Generation, or NLG. (It is customary in computer science circles to create a Three Letter Acronym, or TLA, for a new technology.) The goal of NLG is not to replace human writers, but to make data easier to understand by outputting it in story form. In 2015, there are an estimated 10,000 exabytes of data available in digital format, and computer scientists expect that number to increase to 40,000 exabytes by 2020 (one exabyte = 1 billion gigabytes). Much of that consists of social media posts, digital pictures and movies, sensor data, and log files. Computer scientists have developed methods of aggregating and analyzing that data through technologies such as Hadoop, but there is still a need to turn the data into narrative, or story form, to make it more accessible for non-technical users. NLG could help with that task.

For batting practice, we are going to look at a very simple example of how Natural Language Generation might work. I am going to give you a short computer program that can write headlines for news stories about the outcome of team sports games, like football, basketball, or baseball. We will use the scores of this fall's games between our three county high schools as our test data. Your task will be to read my program, then predict what it will output when it receives the test data.

Note: I wrote the program in pseudo code, which means it is not a real programming language like Java or Python. Instead, it shows the logical flow of the program. Computer scientists sometimes use pseudo code to sketch out a program, before coding it in a real programming language.

Here is some general information about the program:

The Code

// Define a function named WriteHeadline that uses team names and scores to generate a sports headline
// WriteHeadline needs four pieces of data, identified within the parentheses

function WriteHeadline(winningTeam, winningScore, losingTeam, losingScore)

	// Compute the margin of victory and assign it to a variable named marginOfVictory
	variable marginOfVictory = winningScore - losingScore
	
	// assign name of winning team to the subject of our headline
	// assign name of losing team to the direct object of our headline
	variable subject = winningTeam
	variable directObject = losingTeam
	
	// create a variable named space and assign a single space to it
	// because we don't want our subject, verb and direct object to run together
	space = " "
	
	if marginOfVictory >= 21 then
		verb = Random('Blasts','Pounds','Wallops','Trounces','Dominates')
		// Random is a function that randomly chooses a word from the list of sports cliches in parentheses.
		// We assume a 21-point or greater victory is a blowout.
		
	else if marginOfVictory Between 4 and 20 then
		verb = Random('Beats','Defeats')
		// We assume that a margin of victory between 4 and 20 points is a normal win.

		// if we get to this point, we know the winning team won by fewer than 4 points
	else
		verb = Random('Edges','Squeaks By','Nips')
		// If margin of victory was a field goal or less, it was a close game.
	end if
	
	// Construct the headline by combining the subject, verb, and direct object. Don't forget the spaces.
	myHeadline = subject + space + verb + space + directObject
	
	// In the preceding statement, the + symbol is used to string together, or concatenate, the words that make up the headline.
	// You can use the + symbol to piece together strings of text in Javascript, Java, Python, and other programming languages.
	// In these languages, the + symbol is also used to add two numbers.
	// The practice of allowing a single operator, like +, to perform two different operations is called operator overloading.
	
	
	// The function returns the completed headline to the program that called it.
	return myHeadline
	
end function

The Problem

To solve this problem, send me a headline that this program could generate for each of the following test cases, taken from this fall's Smyth County high school football games:

Scoring

You get 1 point for each correct answer you submit, up to a maximum of 3 points. (That's right—you can even score in batting practice.)

Earning Bonus Points

Last spring, Chilhowie High School's Haley Caudill broke a Virginia High School League record by hitting safely in over 40 consecutive softball games. Haley's hitting streak resulted in headlines and tweets like "Hitting Streak for Chilhowie's Caudill Reaches 31, "Haley Caudill extends her hitting streak," and "Chilhowie's Caudill Is a Hitting Machine." You can earn up to 20 bonus points by submitting a short essay on how to build a headline generator that could turn out headlines like these. You don't have to write pseudo code or come up with a working model. Assume that the input for the program will be the box score for a game, and that the program will be able to store and access box scores from previous games. Discuss some of the problems that a programmer would face, and strategies for making the headline generator's output more like that of a person. In other words, how could we improve the headline generator to the point where it could pass the Turing test? The Turing test is a test of a computer or software's ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human. It is named for Alan Turing, the British mathematician who helped the Allies break the Nazis' Enigma code machine during World War II.

First Inning

The Setup: Programming NLG

In batting practice, you analyzed a simple pseudo code function to generate sports headlines. Imagine that you are a newly hired programmer for a company that has landed a contract with ESPN to write short summaries of collegiate, minor league, and major league baseball games, based on the box scores. Since you are the junior member of the team, the team leader has given you the simplest job: code the headline generator function you saw in batting practice in a real computer language. Your function will be used by the programmers who are coding the software to generate the website with its news stories, as well as the team coding the automated Twitter feed. The team that is charged with slicing and dicing all of the box scores has agreed to give you the same four data points you saw in batting practice:

  1. Name of the winning team;
  2. Name of the losing team;
  3. Winning team's score;
  4. Losing team's score.

Scoring

Earn up to 30 points by sending me a source file for a function that performs the task described in the setup section. I'll accept any of the following files:

If you use a Mac spreadsheet program, please convert it to MS Excel format before you send it to me. I don't have anything against Mac software, but I don't have a Mac computer, so I won't be able to evaluate your submission unless you convert it to something I can run on a Windows machine.

If you get ambitious and code a solution in C, C++, C# or something like that, I will accept your submission, but I will have trouble judging whether it runs or not, because I don't have ready access to a compiler for those languages. And don't even consider submitting something in assembly language, because no one likes a show-off.

If you are completely stumped by how to select random words from a list, I suggest you read this tutorial on coding random behavior. If you are completely new to coding in HTML and JavaScript, here is a tutorial to get you started.

Bonus Scoring

Earn as many as 30 bonus points by submitting an essay on one of the following issues:

  1. Subject-verb agreement in the headline generator function. In my pseudo code, I assumed that the subject of the sentence would be singular and made all my verbs singular. That means that WriteHeadline(Hokies, 28, Tar Heels, 3) could generate Hokies Wallops Tar Heels, and that would be wrong (grammatically speaking). In your essay, explain how you would ensure that WriteHeadline will never make your English teacher wince.
  2. If the New England Patriots should lose by 21 or more points, it is perfectly acceptable to have a headline that uses words like Trounce, Blast, Embarrass, etc. After all, this is a group of professional athletes whose combined team payroll would fund the average Virginia school system for several years. Should we use the same harsh headlines for teams made up of non-professional players? What could the programmer do to create a kinder, gentler HeadlineWriter?

Commissioner's Note: I haven't developed the details for all of the remaining innings, but the setup titles will give you an idea what lies ahead. These are subject to change, in case I come across a better subject between now and the start of the inning. I plan to release a new inning about every two weeks.

Second Inning

The Setup: Slicing and Dicing Big and Not-So-Big Data

The first inning problem was really hard, so I'm going to ease up a little in the second inning. In fact, you can solve the second inning problem in less than five minutes, if you have a basic understanding of data and the tools computer scientists use to analyze it. If you lack that understanding, please read on.

Edgar F. "Ted" Codd invented the relational database. In 1970, he published a paper titled "A Relational Model of Data for Large Shared Data Banks." That paper described the theoretical model for the relational database. In 1981, Codd received the Turing Award, the computer science equivalent of the Nobel Prize, for his work.

In batting practice, you learned about the huge amounts of digital data that are being generated each year. Data is only as useful as our ability to understand it. There are a number of tools available for that purpose. For the second inning problem, we are going to look at one of the most commonly used tools: the relational database. A relational database is a database that stores data in the form of related tables. Each table represents a collection of information about a group of entities. The entities can be anything that we need to track: people, places, grades, courses, pictures, products, orders, etc. Relational databases are in wide use:

If you are a high school student who has taken the Computer Information Systems course, you have encountered Microsoft Access, the simplest of all the relational database management systems (RDBMS). Access has many features, but it is limited to individuals or small groups of users working simultaneously on the same database. For applications that have hundreds or thousands of simultaneous users, database developers use commercial database software such as Oracle and Microsoft's SQL Server, or open-source software such as MySQL and Postgres. If you don't have access to Access, you can download one of the open-source database systems or the free Oracle Database Express Edition. Another alternative is Google's Fusion Tables application, which Google describes as "an experimental data visualization web application to gather, visualize, and share data tables." You can access Fusion Tables from Google Drive or at this link. I will also accept answers that consist of the SQL language statements to solve the problem. See the scoring section for more information on this option.

Here is a simple example to help you understand how relational databases work. This example is a relational database that stores information about students. These databases are called student information systems. Student information system databases usually have a table called Students that looks something like this:

StudentIDFNameMNameLNameGenderDateOfBirthSchoolIDGrade_LevelEnroll_Status
1RobertMontgomerySmithM08/15/1998460120
2HannahMarieJonesF04/08/2010730KG0
3D'AngeloWilliamBentleyM12/15/200070092

Most of the column names are self-explanatory: FName means first name, MName means middle name, and so on. The SchoolID and Enroll_Status columns require some explanation. SchoolID identifies the student's school, and Enroll_Status indicates whether a student is currently enrolled, transferred out, graduated, or pre-registered. These pieces of information are stored as numeric codes. In this example, an Enroll_Status of 0 means a student is currently enrolled in school. If you wanted to determine how many students are currently enrolled in each of our schools, you could run a query like this:


SELECT Schools.SchoolName, COUNT(Students.StudentID) AS Total_Enrollment
FROM Students, Schools
WHERE Students.SchoolID = Schools.SchoolID
AND Students.Enroll_Status = 0
GROUP BY Schools.SchoolName
ORDER BY Schools.SchoolName;

This query is written in a language called SQL. Database administrators and developers use SQL to extract information from relational databases. The line that begins with SELECT tells the database system what columns you want; in this case, the name of the school, and the total number of students enrolled in the school. (SQL has a built-in function called COUNT that we are using to count all of the StudentIDs. Since every student has a unique StudentID number, this is the same as counting each student.) The AS clause in the first line is used to name the results of the COUNT function as Total_Enrollment. The FROM statement tells the database to select the data from two tables: Students and Schools. You already know what the Schools table looks like. The Schools table will have columns like SchoolID, SchoolName, SchoolAddress, etc. The SchoolID column contains numbers that uniquely identify each school. For example, Smyth County schools use the following SchoolID numbers.

SchoolIDSchoolNameState_School_Number
290Atkins Elementary School290
730Chilhowie Elementary School730
460Chilhowie High School460
851Chilhowie Middle School851
999999Graduated Students0
710Marion Elementary School710
680Marion Middle School680
700Marion Senior High School700
250Northwood High School250
120Northwood Middle School120
690Oak Point Elementary School690
720Rich Valley Elementary School720
740Saltville Elementary School740
750Smyth Career and Technology Center750
530Sugar Grove Elementary School555

Since each school has a unique SchoolID number, we can use that number in the SchoolID column in the Students table to identify a student's school. Students with SchoolID 700 are Marion Senior High students, students with SchoolID 290 are Atkins Elementary students, and so on. So why do we use numeric codes like SchoolID and Enroll_Status, instead of the easier-to-read text equivalents, such as the name of the school? Performance. In the query above, I filtered the results to currently enrolled students with the statement: Students.Enroll_Status = 0. When running this query, the database system only has to make one comparison to determine if a student meets this criterion: does the value in the student's Enroll_Status = 0? If the system used a string of text to identify students, such as "Currently Enrolled," the database system would have to perform 18 comparisons on the value in each student's Enroll_Status column—one comparison for each letter in the phrase "Currently Enrolled." That is an important piece of information to keep in mind: queries will run faster when they make comparisons between numeric values, not text values.

The other advantage to using numeric codes is that it enables us to store a piece of information such as the name of a school in only one place in the database. In all the other places in the database where school information is referenced, we use the numeric code for the school. That means that if the name of a school changes, we only have to change that name in one place in our entire database: the Name column in the Schools table.

The PowerSchool database software that we use to store student information has tables named Students and Schools. These tables are similar to the tables used in our example. There is one row in the PowerSchool Schools table for each of our 14 schools, and one row in the Schools table for each student who has been enrolled in Smyth County Schools since we began using PowerSchool several years ago. Each column in those tables represents an attribute of the entities in the table, such as the students' gender or date of birth, or a school's name or address. Each table always has one numeric column that uniquely identifies each row in the table. In the Students table, that column is StudentID. It is a positive whole number assigned to a student when we enroll the student in PowerSchool. That unique identifier is called the primary key for the Students table. It never changes. If we need to identify a student in another table, we can use that unique identifier to refer to the student. For example, if we want to record an excused absence for Robert Montgomery Smith on October 19, 2015, we could add the following row to our Attendance table.

StudentIDAttendanceDateAttendanceCode
110/19/2015E

When the primary key from one table appears in a related table, it is called a foreign key in the related table. The StudentID is a foreign key in the Attendance table. Since the Students and Attendance tables share a column, StudentID, we can join those two tables in a query, and have the relational database display the student's name along with all of the dates he has been absent or tardy. Note that this setup reinforces the fundamental idea that we only store information like a student's name in only place in the database: the name columns in the Students table. All of the other tables that contain student data use the student's StudentID number. That means that if a student's name changes, we only have to change the name in one place.

For this problem, I am going to give you a subset of data from two related PowerSchool tables: Students and Schools in CSV (Comma-Separated Value) format. CSV format is a common way to exchange data between systems. In a CSV file, each line represents an individual record (students and schools in these two files), and the columns in each line are separated by commas. The Students file contains five columns: STUDENTID, FIRST_NAME, GENDER, GRADE_LEVEL, and SCHOOLID. The Schools file contains two columns: SCHOOLID and NAME. The SCHOOLID column in the Schools table is a foreign key in the Students table. For example, the SchoolID for Atkins Elementary is 290; therefore you can list all students in Atkins Elementary by running a query that filters for SchoolID=290 in the Students table.

Download the Students CSV file.

Download the Schools CSV file.

Good to Know

Scoring

Earn 10 points per question by correctly answering each of the following questions:

  1. How many girls and how many boys are actively enrolled in your school?
  2. What is the enrollment by grade of your school?
  3. What is the most popular first name for boys and the most popular first name for girls among all of the students in the data I extracted?

Coaching Tips

You could answer each of these questions without using any software tools, but that would be extremely tedious. It would also run counter to the purpose of this challenge, which is to help students learn more about computer science and technology. Here are some suggestions for using software to answer the questions:

Bonus Scoring

Earn up to 60 bonus points by providing a creative and well-reasoned response to one or more of the following questions (up to 20 bonus points per question).

  1. Professional and major college sports teams now use data analytics to prepare for upcoming games. For example, a basketball coach going up against a team with a high percentage of made three-point shots could dig into that data and identify not only the opposing team's best three-point shooters but also their most accurate locations: from the corners, the wings, top of the key, etc., then tailor his defense to cover those players and their best shooting spots. Your assignment is to select a sport with which you are familiar, then design a data model to collect detailed performance data from that sport. The data should be sufficiently specific to enable opposing coaches to improve their game plans. Your answer should include the specific types of data you are collecting and how teams will benefit from that data.
  2. "It's really about scaling the greatest minds to every mind." That remark is from a video IBM produced on Watson. Watson is IBM's name for a "technology platform that uses natural language processing and machine learning to reveal insights from large amounts of unstructured data." Watson, best known as the computer that beat the two best Jeopardy champions of all time, is now helping physicians develop more effective cancer treatments by analyzing huge amounts of medical data. Visit the Watson website to learn more about cognitive computing, then submit an idea for a new application for Watson. (Watson is not a traditional relational database management system, but is an example of a new generation of software that gives humans greater insight into data.)
  3. Come up with an idea for a new social media website. Describe the data model that you would develop to make this site work. (Facebook cofounder Mark Zuckerberg went through this exercise back in his college days when he created the social media site that eventually became Facebook.)

Submit your answers to codechallenge@scsb.org.


Third Inning

Word cloud based on first names of Smyth County students.

The Setup: Visualizing Data

The second inning problem required you to perform some data analysis using a relational database. Relational databases are very useful tools for digging into data, but sometimes what you really need is a tool to help people visualize data. For example, the name cloud above is based on the first names of Smyth County students—the more popular names generally appear in a larger font.

To earn top points for this challenge, you have to create several data visualizations: a few easy ones and one really hard one. For the easy ones, use the two data extracts from the second challenge (students and schools) to create one or more of the following visualizations:

You can earn up to 10 points for each visualization you create for a maximum of 30 points.

For the hard one, begin by downloading this Student Performance by Question (SPBQ) dataset. It is a tab-delimited text file that summarizes student performance by question on the 2014-2015 Algebra I SOL assessment. It consists of three columns:

  1. Descriptor, which is a text description of the skill that a particular SOL question is designed to access; for example, "Add polynomial expressions."
  2. Correct or Incorrect, indicating whether the question was answered correctly or incorrectly.
  3. Count, the number of questions answered correctly or incorrectly.

Here is an example. The following table shows that there were 336 total questions designed to test students' ability to add polynomial expressions. Students answered 167 of those questions correctly, and 169 incorrectly, indicating this is one of the more difficult skills.

DescriptorCorrect or IncorrectCount
Add polynomial expressions.Correct167
Add polynomial expressions.Incorrect169

You can earn up to 150 points by generating a visualization for teachers and administrators that enables them to quickly identify:

Save your visualizations in any standard graphics format and send them to codechallenge@scsb.org


Fourth Inning

The Setup: Cutting the Grass

Today's spreadsheet programs—Microsoft Excel, Google Sheets, Apple Numbers, Libre Office Calc, etc.—are descendants of VisiCalc. VisiCalc was the first spreadsheet software for personal computers, and it helped launch the personal computer revolution. People were so hungry for software that could handle thousands of tedious calculations in a couple of seconds that they would buy personal computers just so they could run VisiCalc. VisiCalc and its descendants have saved human beings countless hours of work. In his book Serious Play: How the World's Best Companies Simulate to Innovate, author Michael Schrage devoted an entire chapter to spreadsheets and argued that "low-cost spreadsheet software effectively launched the largest and most significant experiment in rapid prototyping and simulation in the history of business."

The power of spreadsheet software is its ability to model a real-world situation that can include hundreds or thousands of variables, and provide near instantaneous updates when any of those variables change. For the fourth inning challenge, use a spreadsheet to answer the following question: how much does it cost the Boston Red Sox to maintain the grass at Fenway Park for an entire season?

I don't expect you to contact the Red Sox front office and pose that question to someone there; instead, assume that you are a freelance information technology consultant, and you have been hired by the Boston Red Sox to create a solution that the ballpark's head groundskeeper can use to forecast his direct expenses for maintaining the park's grass over the course of a season. For the purposes of this simulation, assume that the Red Sox have contracted with you to provide a customized spreadsheet solution, and that I'm the head groundskeeper and that you have asked me the following questions. Don't assume that these are all the questions you need to account for in your spreadsheet. (A successful software developer has to learn how to ask questions, in order to understand the client's needs and develop a solution to satisfy those needs.)

schematic diagram of Fenway Park

Here is the schematic diagram provided to you by the groundskeeper. The dimensions marked on the diagram are the distances from home plate to the outfield walls at various points. It may also be helpful to know that the baseball diamond formed by home plate and the three bases is a 90-foot square.

How big is Fenway Park?
Here is a schematic showing the dimensions. I know all you computer people are good at math, so you can figure out how many square feet there are.
How often do you mow during the season?
Everyday.
Do you go over the same area more than once in order to get those fancy patterns?
No. We just mow in alternate directions. That makes the blades of grass bend more in one direction than another. That makes one side appear lighter than the other, which creates the patterns.
How many people do you use to mow?
Three.
What do they get paid?
$20 an hour.
Do you fertilize the grass?
Yes, about every six weeks, with Scotts® Fenway Park™ Lawn Fertilizer.

You can learn more about major league ballpark groundskeeping here and here.

Scoring

You can earn up to 100 points for this challenge by sending me a spreadsheet that takes into account the variables that the head groundskeeper needs to consider as he formulates his budget for maintaining the Fenway Park grass. The highest scoring spreadsheets will be those that:

You can use any spreadsheet to create your solution; however, I don't have access to a Mac computer, so if you use Numbers, please save the spreadsheet in Excel format before you send it to me. If you use Google Sheets, you can just share your solution with terryh@scsb.org.


Fifth Inning

The Setup: Make a Metronome With Scratch

I originally planned to have Coding Challenge participants create an interactive storybook using the Scratch programming language, but decided to change it up when I heard about the newly formed Smyth County chapter of JAM (Junior Appalachian Musicians). In honor of that event, the fifth inning problem is to create a metronome. Musicians have used metronomes for centuries to help them develop a sense of timing. The first metronomes were mechanical devices, and mechanical metronomes are still in use; however, many musicians now use electronic metronomes or metronome apps. A metronome app, a mimimum, should:

I selected Scratch for this problem because it enables beginning programmers to create applications with graphical user interfaces (GUI) without having to learn the complicated syntax of a programming language like Java or C++. Scratch has some limitations compared to traditional text-based programming languages; nevertheless, it is used by colleges and universities in some beginning computer science classes. It is certainly capable of creating a metronome app, as you can see in my proof of concept: Scratchy the BeatBox Blues Cat.

Scoring

To submit your answer, send me the link to your Scratch project.

Tip: It is OK to get started by looking at the code for my proof of concept Scratch project; however, just copying that code and submitting it with no or minimal changes won't earn you any points. My code has at least one major bug in it. If your code reproduces that bug, you will be penalized 50 points.

It's got a backbeat, you can't lose it.

Rock and Roll Music by Chuck Berry

Bonus Scoring

The backbeat is a feature of much popular music. In 4/4 time, the backbeat is a syncopated accent on beats 2 and 4 in each measure. For an extra 100 points, modify your metronome app by giving it an option to provide an audible or visible backbeat.

Sixth Inning

The Setup: We Will Rock You

Your assignment for this challenge is to create a digital jukebox using Scratch or another programming language of your choice. Your solution should enable users to play a song on their computers, tablets, or smartphones. You must not violate anyone's copyright. In other words, you can't rip a CD of songs to MP3 format, and use those for your songs. (This doesn't apply if you are the copyright holder for those songs. Note that just owning a commercial CD doesn't make you the copyright holder for the songs on that CD.)

That means you are going to have to create your own songs. Here are some options for accomplishing that:

  1. If you have some musical ability, record your own songs. You don't have to have a lot of equipment to do this. (British musician Jake Bugg used his iPhone to record the last song on his first CD.) You could also use the microphone built into most laptops, along with free recording and editing software, such as Audacity, open-source software which runs on Windows, Mac OS, and Linux. There are also free cloud-based solutions for recording and editing audio on Chromebooks. See www.apowersoft.com/record-audio-on-chromebook.html for suggestions.
  2. Even if you can't play an instrument and you can't sing, you can still produce music by using a program such as Apple's GarageBand or the free open-source LMMS software. Programs like GarageBand and LMMS let you create melodies and beats from sampled sounds or their included synthesized instruments. There are YouTube tutorials on using both of these products.
  3. If you are truly desperate, check back here during the final week of February. I will post some copyright-free electric guitar samples you can download and string together to create riffs from popular songs.

Scoring

Earn 100 points for creating a digital jukebox with an attractive, user-friendly interface. Earn an additional 5 to 50 points for each song in your jukebox, up to a maximum of 20 songs.

Why is there such a range of point values per song? Computers play an important role in music production today, and this challenge encourages you to explore that role by using computers to create or edit your own music. For the purposes of this challenge, I am going to call anything that sounds vaguely musical a song. So if you program Scratch to play a single descending major scale that sounds like the opening melody of "Joy to the World," you could earn 5 points. If you used LMMS or GarageBand to create an original, polished composition, you could earn up to 50 points. Our panel of distinguished music critics will be the judge of what you get.

Seventh Inning

The Setup: March Madness Bracketology

The NCAA men's Division 1 basketball tournament takes place in March, giving rise to the term March Madness. During March Madness, basketball fans often complete brackets that show the expected winners of each of the games played during the tournament. For this problem, you have to do two things:

  1. Use a computational tool other than a search engine to determine the purely mathematical odds* of filling out a perfect bracket, not counting the four First Four play-in games. In other words, you need to determine the total number of different brackets possible in the remaining 64-team single-elimination tournament. Hint: the answer is 1 in 9,223,372,036,854,775,808. You could easily find that answer using a search engine, which is why I'm giving it to you in advance. Your task is to use a spreadsheet, math software, calculator, programming language or other computational tool to compute the answer. Earn 50 points by sending me a screenshot of your solution, along with an explanation of how you obtained it. Your explanation should include both the mathematics underlying your calculation and how you used the computational tool to arrive at the answer.
  2. Explain why some computational tools can't compute the exact answer. For example, both Microsoft Excel 2010 and Google Sheets display the answer as 9,223,372,036,854,780,000. You can earn another 100 points for a thorough explanation of why Excel and Sheets come up with an inexact answer.

*By purely mathematical odds, I mean the odds of picking a perfect bracket out of all of the possible outcomes, which is 1 in 9 quintillion, 223 quadrillion, 372 trillion, 36 billion, 854 million, 775 thousand, 8 hundred and 8. Because the outcome of these games is not a random event like the toss of a coin, the actual odds of picking a perfect bracket are higher than the purely mathematical odds. For example, the teams in the NCAA tournament are seeded from 1 to 16, and any bracket that has a 16th seeded team advancing past the first round is extremely unlikely. The four 16th seeded teams play the four number 1 seeded teams in the first round, and no 16 seed has ever beaten a 1 seed since the tournament was expanded to 64 teams in 1985. (Three 16 seeds have come close. In 1990, number 16 Murray State took number 1 Michigan State to overtime before losing 75-71. The year before, number 16 Princeton lost to number 1 Georgetown 50-49, and number 16 East Tennessee State University lost to number 1 Oklahoma 72-71. In both of these games, the number 16 seeds had chances to win in the closing seconds. Southwest Virginia's own Calvin Talford was ETSU's second leading scorer in the Oklahoma game.)

Proof of Concept

Screenshot of Javascript solution to the Seventh Inning bracketology problem.

Eighth Inning

The Setup: Building the Perfect Pig

Pig is an example of what is called a jeopardy dice game. The rules are simple. Player A rolls a single die. If Player A rolls a 1, he gets 0 points for that turn and play passes to Player B. If Player A rolls something other than 1, that value is added to his turn. Then he has the option to roll again or pass. If he passes, play passes to Player B. If Player A decides to keep rolling, each roll is added to his total for the current turn, unless he rolls 1, in which case he loses all of his points for that turn. First player to reach 100 wins.

Pig poses interesting exercises in probability. The chances of earning points on each roll of the die are always 5 out of 6, while the chance of losing all of your points for that turn are 1 out of 6 on each roll. A superficial analysis would lead you to believe that you should never pass, but the problem is that your risk increases with each successful roll. On your first roll, you are risking 0 points, so you should never pass without rolling at least once. Say you roll 2 on your first roll. On your second roll, are you willing to risk that 2 for a 5/6 chance of getting more points? Mathematically speaking, you should. The chances that you will lose that 2 are only 1/6, and if you lose it, the chances of getting at least 2 points back on your next turn are 5/6. But what if you have had several successful rolls in a row, and you are up to 20 points for a particular turn? Are you willing to risk 20 on a single roll of the die? Your chances of keeping those 20 points are still 5/6, but you have to weigh the risk against the reward, which is a maximum of 6 points. You also have to take into account your opponent's score. Say you have 77 points and your opponent has 98 when you start a turn. In that case, you should keep rolling until you amass 23 points or roll a 1, because the probability of your opponent winning on his next turn is 5/6.

3D graph visualizing optimal play for Pig dice game based on each player's score and the total points accumulated in the current turn.

In 2004, two Gettysburg College computer scientists published an article in the UMAP Journal (Undergraduate Mathematics and Its Applications) that described the optimal play for Pig. Even though Pig is a very simple game, optimal play for it turns out to be "surprisingly complex and beyond human potential to memorize and apply," as the authors (Todd W. Neller and Clifton G.M. Presser) pointed out in a follow-up article in the UMAP Journal. The graph at the right is a visualization of Neller and Presser's optimization. Values within the gray solid represent the combinations of player score, opponent score, and current turn total for which a player should continue to roll.

For Problem 8, your assignment is to code a game in which a human player plays Pig against a software opponent.* For your game, assume that the human player goes first. Your program should accept input from the human player to indicate whether he wants to hold or roll again. You can use whatever method you choose to determine the software player's strategy. It could be as simple as always having the software player roll a set number of times per turn, or something more complex, based on its turn total (weighing risk against reward), or its turn total and the human player's current total. Since Neller and Presser have calculated the optimal Pig strategy, you could read their 2004 paper and code that strategy into your software player. Be forewarned: understanding that paper requires knowledge of linear algebra, and optimizing your software Pig player's strategy per Neller/Presser's method would be an extremely complex undertaking.

*This is actually an assignment from the Edhesive Introduction to Computer Science course that the Smyth County School Board is considering as a course offering for students beginning in the 2016-2017 school year.

Scoring


Ninth Inning

The Setup: Do You Have What It Takes to Break the Nazis' Secret Code?

This last problem is probably the hardest. I can't say for certain, because I haven't solved it myself. (I came up with all of the other problems, so I knew how difficult they would be before I published them.) Since this problem is probably going to take considerable effort to solve, I am going to release it early to give you maximum time to work on it.

Throughout history, warring armies and nations have devised various methods of keeping their battle plans and other communications secret. The Roman commander Julius Caesar used a simple substitution code to encrypt messages to his subordinates. A Caesar cipher just substitutes each letter of the plaintext message with a letter of the alphabet that is shifted a certain number of spaces to the left or right. For example, A could be replaced by D, B by E, C by F, and so on. Caesar ciphers are trivial to break, so cryptographers (people who create secret codes) developed more and more complicated ciphers down through the centuries.

During World War II, Germany used a mechanical cryptographic device called an Enigma machine to encrypt its military messages. The Enigma looked like an old-fashioned typewriter. When the operator typed a message, the machine produced an encoded version of the message, in which each letter was represented by a different substitution cipher. The genius of the Enigma was that its design prevented a codebreaker from using brute force attacks to decrypt its messages. (A brute force attack is one in which a codebreaker simply uses all the possible settings of a cipher to break a code. The military version of the Enigma had nearly 159 quintillion settings, and the Germans changed the settings every day, so it was not possible to use brute force methods against the Enigma.) Despite the complexity of the Enigma code, Allied codebreakers succeeded in cracking it. Alan Turing, the father of modern computer science, was one of the chief codebreakers responsible for breaking Enigma.

The Allied codebreakers were headquartered at an installation called Bletchley Park in England. Each day, they would receive a batch of Enigma-encrypted messages intercepted by Allied radio operators, and they would set to work on deciphering the messages. For the final challenge, consider yourself a member of the Bletchley Park team. Visit the web-based Enigma Cipher Challenge and use your codebreaking skills to decipher Message 1.

Scoring

Earn 600 points by sending me the original German and an English translation of Message 1. (The messages are encoded in German.)

Tips

I haven't deciphered Message 1 yet, but I will share my strategy for approaching the problem:

  1. Visit one of the Enigma simulations on the Internet to get some hands-on practice on using an Enigma. The Bletchley Park codebreakers had access to captured Enigma machines, so they were familiar with the operation of the machine. What they didn't have was the Germans' daily codebooks, which indicated the Enigma's initial settings for a particular day. In order to decipher an Enigma message, you have to figure out which of the 159 quintillion settings was used to encrypt the message.
  2. Read the Wikipedia entry on Enigma to learn how the machine worked. The sections on the machine's operations are particularly useful.
  3. Locate someone who speaks German or use an online translator to translate the German on the scrap of paper featured in Message 1 of the Enigma Cipher Challenge. My knowledge of German is rusty, but I think the paper reads something like this: "For Manfred, good night watch from Walter." I take that to mean that Manfred is the German sailor who is coming on watch to take over for Walter. Schl├╝ssel refers to the Enigma's encyption key and wechselen means to switch or change, so it appears Walter is telling Manfred to change the daily encryption key for the Enigma to the settings on the scrap of paper. (Writing down the settings on a scrap of paper would probably have been a breach of protocol, since the actual codebooks were printed with special water-soluble ink, so they could be quickly destroyed in case the ship was facing capture.) Wetterfunk may be a compound word: Wetter is German for weather and funk can mean wireless, so maybe Walter is reminding Manfred to send a weather report by radio to U-boats 105, 94, and 109 at 02:00 hours. German U-boats shared weather conditions and their locations by exchanging Enigma-encrypted messages. I believe the rest of the message on the scrap of paper refers to the initial settings of the Enigma's rotors, reflector, and plugboard. Figure those out and you should be able to decrypt the message into German.
  4. Then use a web-based German-to-English translator to translate the decrypted message.

Good luck! Check back here from time to time. If I come up with a more accurate translation of the contents of the German on the scrap of paper, I will post them here.

If trying to solve this problem causes you great mental stress, just remember the pressure the Bletchley Park codebreakers were under—millions of lives and the future of civilization hung in the balance when they were trying to crack Enigma.