Last year, Smyth County Schools held a Coding Challenge to build students' interest in careers in computer science and programming. The 2014 Coding Challenge asked students to attempt level 10 of the Blockly Games maze puzzle. That Challenge was akin to running a mile in PE class. If you stuck with it long enough, you would eventually get to the end of the maze.
But now it's October and the World Series is coming up, so I've designed the 2015 Coding Challenge to be more like a baseball game than a mile run. Instead of a single problem to solve, there will be nine innings, each with its own problem, plus a batting practice challenge for warmup purposes. Each inning will consist of a computer science or technology topic and a related problem. You will score runs based on your ability to solve each inning's problem. Here's what you need to know about the 2015 Coding Challenge.
A classic example of a problem that was solved in fits and starts over a period of time was the key exchange problem—how do people or organizations who need to exchange confidential information do so without having to first exchange an encryption key? Whitfield Diffie defined the problem when he foresaw that the growth of the Internet would lead to the need to develop an encryption system that wouldn't require everyone who needed to communicate confidentially to first physically exchange an encryption key. Diffie's identification of the problem eventually led to what is called public key cryptography, exemplified by the RSA crytosystem developed by the team of Ron Rivest, Adi Shamir, and Leonard Adleman. Amazon and all other online retailers owe a great debt to Diffie, Rivest, Shamir, and Adleman; without the work of these researchers, secure online commerce would not be possible.
Baseball players warm up with batting practice. This gives hitters a chance to swing at some easy pitches, before they face the opposing pitcher's 96-mph fastball. We are going to start with the Coding Challenge equivalent of batting practice. I'm going to toss you some easy pitches before you go up against the coding equivalent of Chicago Cubs pitcher Jake Arietta in the first inning. (Arietta won 22 games this season, while allowing only 1.77 earned runs per game. Both his fastball and sinker average 95 mph.)
Stockholm: The Swedish Academy announced today that the 2067 Nobel Prize in Literature is awarded to the iPhone 58z…
The first computers were people.
OK…imagining a smartphone as a Nobel laureate is a stretch, but there is no question that computers are taking over complex mental tasks that once only people could perform. This change is reflected in the evolution of the word computer. According to the Oxford English Dictionary, the first printed instance of the word computer appeared in 1613 in a publication entitled Yong Mans Gleanings; "I haue read the truest computer of Times, and the best Arithmetician that euer breathed, and he reduceth thy dayes into a short number."
Until the 20th Century, the world computer referred to a person who was employed in the task of performing mathematical calculations. With the development of mechanical and electronic devices that could outperform human computers, the meaning of the word changed to what we think of as a computer today. Now we have computers that can beat the world's best checkers, chess, or Jeopardy players. How about a computer that could replace sportswriters and other news reporters? In 2010, the New York Times filed a story about a startup company that had developed software that, given a stat sheet for an NCAA basketball game, could write a news story for the game. Other organizations are using computers to write weather reports. Can you tell which of the following pollen forecasts was written by a computer and which was written by a person?
But charts, graphs and rankings alone cannot replace words that tell a story. We humans love stories; a craving for narrative seems part of our nature.
Pollen counts are expected to remain high at level 6 over most of Scotland, and even level 7 in the south east. The only relief is in the Northern Isles and far northeast of mainland Scotland with medium levels of pollen count.
Grass pollen levels for Friday have increased from the moderate to high levels of yesterday with values of around 6 to 7 across most parts of the country. However, in Northern areas, pollen levels will be moderate with values of 4.
How much is 10,000 exabytes of data? If all of that data were transferred to a stack of new 128GB iPad Pro tablets, each of which is slightly more than 1/4-inch thick, the height of the stack would be considerably greater than the distance from the Earth to the moon, and the cost would far exceed the entire Gross Domestic Product of the United States. The iPad has a lot of uses, but massive data storage is not one of them.
|Estimated global data in 2015, measured in gigabytes||10,000,000,000,000|
|Estimated available storage capacity of 128GB iPad Pro in gigabytes||114|
|Number of iPad Pros required to store the global data||87,719,298,246|
|Thickness of an iPad Pro in inches||0.27|
|Height of iPad Global Data Stack in miles||373,804|
|Distance from earth to moon in miles||238,900|
|Cost of the iPad Pro Global Data stack||$83,245,614,035,454|
|USA's Gross Domestic Product 2014||$17,419,000,000,000|
These are examples of a technology called Natural Language Generation, or NLG. (It is customary in computer science circles to create a Three Letter Acronym, or TLA, for a new technology.) The goal of NLG is not to replace human writers, but to make data easier to understand by outputting it in story form. In 2015, there are an estimated 10,000 exabytes of data available in digital format, and computer scientists expect that number to increase to 40,000 exabytes by 2020 (one exabyte = 1 billion gigabytes). Much of that consists of social media posts, digital pictures and movies, sensor data, and log files. Computer scientists have developed methods of aggregating and analyzing that data through technologies such as Hadoop, but there is still a need to turn the data into narrative, or story form, to make it more accessible for non-technical users. NLG could help with that task.
For batting practice, we are going to look at a very simple example of how Natural Language Generation might work. I am going to give you a short computer program that can write headlines for news stories about the outcome of team sports games, like football, basketball, or baseball. We will use the scores of this fall's games between our three county high schools as our test data. Your task will be to read my program, then predict what it will output when it receives the test data.
Note: I wrote the program in pseudo code, which means it is not a real programming language like Java or Python. Instead, it shows the logical flow of the program. Computer scientists sometimes use pseudo code to sketch out a program, before coding it in a real programming language.
Here is some general information about the program:
To solve this problem, send me a headline that this program could generate for each of the following test cases, taken from this fall's Smyth County high school football games:
You get 1 point for each correct answer you submit, up to a maximum of 3 points. (That's right—you can even score in batting practice.)
Last spring, Chilhowie High School's Haley Caudill broke a Virginia High School League record by hitting safely in over 40 consecutive softball games. Haley's hitting streak resulted in headlines and tweets like "Hitting Streak for Chilhowie's Caudill Reaches 31, "Haley Caudill extends her hitting streak," and "Chilhowie's Caudill Is a Hitting Machine." You can earn up to 20 bonus points by submitting a short essay on how to build a headline generator that could turn out headlines like these. You don't have to write pseudo code or come up with a working model. Assume that the input for the program will be the box score for a game, and that the program will be able to store and access box scores from previous games. Discuss some of the problems that a programmer would face, and strategies for making the headline generator's output more like that of a person. In other words, how could we improve the headline generator to the point where it could pass the Turing test? The Turing test is a test of a computer or software's ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human. It is named for Alan Turing, the British mathematician who helped the Allies break the Nazis' Enigma code machine during World War II.
In batting practice, you analyzed a simple pseudo code function to generate sports headlines. Imagine that you are a newly hired programmer for a company that has landed a contract with ESPN to write short summaries of collegiate, minor league, and major league baseball games, based on the box scores. Since you are the junior member of the team, the team leader has given you the simplest job: code the headline generator function you saw in batting practice in a real computer language. Your function will be used by the programmers who are coding the software to generate the website with its news stories, as well as the team coding the automated Twitter feed. The team that is charged with slicing and dicing all of the box scores has agreed to give you the same four data points you saw in batting practice:
Earn up to 30 points by sending me a source file for a function that performs the task described in the setup section. I'll accept any of the following files:
If you use a Mac spreadsheet program, please convert it to MS Excel format before you send it to me. I don't have anything against Mac software, but I don't have a Mac computer, so I won't be able to evaluate your submission unless you convert it to something I can run on a Windows machine.
If you get ambitious and code a solution in C, C++, C# or something like that, I will accept your submission, but I will have trouble judging whether it runs or not, because I don't have ready access to a compiler for those languages. And don't even consider submitting something in assembly language, because no one likes a show-off.
Earn as many as 30 bonus points by submitting an essay on one of the following issues:
Commissioner's Note: I haven't developed the details for all of the remaining innings, but the setup titles will give you an idea what lies ahead. These are subject to change, in case I come across a better subject between now and the start of the inning. I plan to release a new inning about every two weeks.
The first inning problem was really hard, so I'm going to ease up a little in the second inning. In fact, you can solve the second inning problem in less than five minutes, if you have a basic understanding of data and the tools computer scientists use to analyze it. If you lack that understanding, please read on.
Edgar F. "Ted" Codd invented the relational database. In 1970, he published a paper titled "A Relational Model of Data for Large Shared Data Banks." That paper described the theoretical model for the relational database. In 1981, Codd received the Turing Award, the computer science equivalent of the Nobel Prize, for his work.
In batting practice, you learned about the huge amounts of digital data that are being generated each year. Data is only as useful as our ability to understand it. There are a number of tools available for that purpose. For the second inning problem, we are going to look at one of the most commonly used tools: the relational database. A relational database is a database that stores data in the form of related tables. Each table represents a collection of information about a group of entities. The entities can be anything that we need to track: people, places, grades, courses, pictures, products, orders, etc. Relational databases are in wide use:
If you are a high school student who has taken the Computer Information Systems course, you have encountered Microsoft Access, the simplest of all the relational database management systems (RDBMS). Access has many features, but it is limited to individuals or small groups of users working simultaneously on the same database. For applications that have hundreds or thousands of simultaneous users, database developers use commercial database software such as Oracle and Microsoft's SQL Server, or open-source software such as MySQL and Postgres. If you don't have access to Access, you can download one of the open-source database systems or the free Oracle Database Express Edition. Another alternative is Google's Fusion Tables application, which Google describes as "an experimental data visualization web application to gather, visualize, and share data tables." You can access Fusion Tables from Google Drive or at this link. I will also accept answers that consist of the SQL language statements to solve the problem. See the scoring section for more information on this option.
Here is a simple example to help you understand how relational databases work. This example is a relational database that stores information about students. These databases are called student information systems. Student information system databases usually have a table called Students that looks something like this:
Most of the column names are self-explanatory: FName means first name, MName means middle name, and so on. The SchoolID and Enroll_Status columns require some explanation. SchoolID identifies the student's school, and Enroll_Status indicates whether a student is currently enrolled, transferred out, graduated, or pre-registered. These pieces of information are stored as numeric codes. In this example, an Enroll_Status of 0 means a student is currently enrolled in school. If you wanted to determine how many students are currently enrolled in each of our schools, you could run a query like this:
SELECT Schools.SchoolName, COUNT(Students.StudentID) AS Total_Enrollment FROM Students, Schools WHERE Students.SchoolID = Schools.SchoolID AND Students.Enroll_Status = 0 GROUP BY Schools.SchoolName ORDER BY Schools.SchoolName;
This query is written in a language called SQL. Database administrators and developers use SQL to extract information from relational databases. The line that begins with SELECT tells the database system what columns you want; in this case, the name of the school, and the total number of students enrolled in the school. (SQL has a built-in function called COUNT that we are using to count all of the StudentIDs. Since every student has a unique StudentID number, this is the same as counting each student.) The AS clause in the first line is used to name the results of the COUNT function as Total_Enrollment. The FROM statement tells the database to select the data from two tables: Students and Schools. You already know what the Schools table looks like. The Schools table will have columns like SchoolID, SchoolName, SchoolAddress, etc. The SchoolID column contains numbers that uniquely identify each school. For example, Smyth County schools use the following SchoolID numbers.
|290||Atkins Elementary School||290|
|730||Chilhowie Elementary School||730|
|460||Chilhowie High School||460|
|851||Chilhowie Middle School||851|
|710||Marion Elementary School||710|
|680||Marion Middle School||680|
|700||Marion Senior High School||700|
|250||Northwood High School||250|
|120||Northwood Middle School||120|
|690||Oak Point Elementary School||690|
|720||Rich Valley Elementary School||720|
|740||Saltville Elementary School||740|
|750||Smyth Career and Technology Center||750|
|530||Sugar Grove Elementary School||555|
Since each school has a unique SchoolID number, we can use that number in the SchoolID column in the Students table to identify a student's school. Students with SchoolID 700 are Marion Senior High students, students with SchoolID 290 are Atkins Elementary students, and so on. So why do we use numeric codes like SchoolID and Enroll_Status, instead of the easier-to-read text equivalents, such as the name of the school? Performance. In the query above, I filtered the results to currently enrolled students with the statement: Students.Enroll_Status = 0. When running this query, the database system only has to make one comparison to determine if a student meets this criterion: does the value in the student's Enroll_Status = 0? If the system used a string of text to identify students, such as "Currently Enrolled," the database system would have to perform 18 comparisons on the value in each student's Enroll_Status column—one comparison for each letter in the phrase "Currently Enrolled." That is an important piece of information to keep in mind: queries will run faster when they make comparisons between numeric values, not text values.
The other advantage to using numeric codes is that it enables us to store a piece of information such as the name of a school in only one place in the database. In all the other places in the database where school information is referenced, we use the numeric code for the school. That means that if the name of a school changes, we only have to change that name in one place in our entire database: the Name column in the Schools table.
The PowerSchool database software that we use to store student information has tables named Students and Schools. These tables are similar to the tables used in our example. There is one row in the PowerSchool Schools table for each of our 14 schools, and one row in the Schools table for each student who has been enrolled in Smyth County Schools since we began using PowerSchool several years ago. Each column in those tables represents an attribute of the entities in the table, such as the students' gender or date of birth, or a school's name or address. Each table always has one numeric column that uniquely identifies each row in the table. In the Students table, that column is StudentID. It is a positive whole number assigned to a student when we enroll the student in PowerSchool. That unique identifier is called the primary key for the Students table. It never changes. If we need to identify a student in another table, we can use that unique identifier to refer to the student. For example, if we want to record an excused absence for Robert Montgomery Smith on October 19, 2015, we could add the following row to our Attendance table.
When the primary key from one table appears in a related table, it is called a foreign key in the related table. The StudentID is a foreign key in the Attendance table. Since the Students and Attendance tables share a column, StudentID, we can join those two tables in a query, and have the relational database display the student's name along with all of the dates he has been absent or tardy. Note that this setup reinforces the fundamental idea that we only store information like a student's name in only place in the database: the name columns in the Students table. All of the other tables that contain student data use the student's StudentID number. That means that if a student's name changes, we only have to change the name in one place.
For this problem, I am going to give you a subset of data from two related PowerSchool tables: Students and Schools in CSV (Comma-Separated Value) format. CSV format is a common way to exchange data between systems. In a CSV file, each line represents an individual record (students and schools in these two files), and the columns in each line are separated by commas. The Students file contains five columns: STUDENTID, FIRST_NAME, GENDER, GRADE_LEVEL, and SCHOOLID. The Schools file contains two columns: SCHOOLID and NAME. The SCHOOLID column in the Schools table is a foreign key in the Students table. For example, the SchoolID for Atkins Elementary is 290; therefore you can list all students in Atkins Elementary by running a query that filters for SchoolID=290 in the Students table.
Earn 10 points per question by correctly answering each of the following questions:
You could answer each of these questions without using any software tools, but that would be extremely tedious. It would also run counter to the purpose of this challenge, which is to help students learn more about computer science and technology. Here are some suggestions for using software to answer the questions:
Earn up to 60 bonus points by providing a creative and well-reasoned response to one or more of the following questions (up to 20 bonus points per question).
Submit your answers to email@example.com.
The second inning problem required you to perform some data analysis using a relational database. Relational databases are very useful tools for digging into data, but sometimes what you really need is a tool to help people visualize data. For example, the name cloud above is based on the first names of Smyth County students—the more popular names generally appear in a larger font.
To earn top points for this challenge, you have to create several data visualizations: a few easy ones and one really hard one. For the easy ones, use the two data extracts from the second challenge (students and schools) to create one or more of the following visualizations:
You can earn up to 10 points for each visualization you create for a maximum of 30 points.
For the hard one, begin by downloading this Student Performance by Question (SPBQ) dataset. It is a tab-delimited text file that summarizes student performance by question on the 2014-2015 Algebra I SOL assessment. It consists of three columns:
Here is an example. The following table shows that there were 336 total questions designed to test students' ability to add polynomial expressions. Students answered 167 of those questions correctly, and 169 incorrectly, indicating this is one of the more difficult skills.
|Descriptor||Correct or Incorrect||Count|
|Add polynomial expressions.||Correct||167|
|Add polynomial expressions.||Incorrect||169|
You can earn up to 150 points by generating a visualization for teachers and administrators that enables them to quickly identify:
Save your visualizations in any standard graphics format and send them to firstname.lastname@example.org
Today's spreadsheet programs—Microsoft Excel, Google Sheets, Apple Numbers, Libre Office Calc, etc.—are descendants of VisiCalc. VisiCalc was the first spreadsheet software for personal computers, and it helped launch the personal computer revolution. People were so hungry for software that could handle thousands of tedious calculations in a couple of seconds that they would buy personal computers just so they could run VisiCalc. VisiCalc and its descendants have saved human beings countless hours of work. In his book Serious Play: How the World's Best Companies Simulate to Innovate, author Michael Schrage devoted an entire chapter to spreadsheets and argued that "low-cost spreadsheet software effectively launched the largest and most significant experiment in rapid prototyping and simulation in the history of business."
The power of spreadsheet software is its ability to model a real-world situation that can include hundreds or thousands of variables, and provide near instantaneous updates when any of those variables change. For the fourth inning challenge, use a spreadsheet to answer the following question: how much does it cost the Boston Red Sox to maintain the grass at Fenway Park for an entire season?
I don't expect you to contact the Red Sox front office and pose that question to someone there; instead, assume that you are a freelance information technology consultant, and you have been hired by the Boston Red Sox to create a solution that the ballpark's head groundskeeper can use to forecast his direct expenses for maintaining the park's grass over the course of a season. For the purposes of this simulation, assume that the Red Sox have contracted with you to provide a customized spreadsheet solution, and that I'm the head groundskeeper and that you have asked me the following questions. Don't assume that these are all the questions you need to account for in your spreadsheet. (A successful software developer has to learn how to ask questions, in order to understand the client's needs and develop a solution to satisfy those needs.)
Here is the schematic diagram provided to you by the groundskeeper. The dimensions marked on the diagram are the distances from home plate to the outfield walls at various points. It may also be helpful to know that the baseball diamond formed by home plate and the three bases is a 90-foot square.
You can earn up to 100 points for this challenge by sending me a spreadsheet that takes into account the variables that the head groundskeeper needs to consider as he formulates his budget for maintaining the Fenway Park grass. The highest scoring spreadsheets will be those that:
You can use any spreadsheet to create your solution; however, I don't have access to a Mac computer, so if you use Numbers, please save the spreadsheet in Excel format before you send it to me. If you use Google Sheets, you can just share your solution with email@example.com.
I originally planned to have Coding Challenge participants create an interactive storybook using the Scratch programming language, but decided to change it up when I heard about the newly formed Smyth County chapter of JAM (Junior Appalachian Musicians). In honor of that event, the fifth inning problem is to create a metronome. Musicians have used metronomes for centuries to help them develop a sense of timing. The first metronomes were mechanical devices, and mechanical metronomes are still in use; however, many musicians now use electronic metronomes or metronome apps. A metronome app, a mimimum, should:
I selected Scratch for this problem because it enables beginning programmers to create applications with graphical user interfaces (GUI) without having to learn the complicated syntax of a programming language like Java or C++. Scratch has some limitations compared to traditional text-based programming languages; nevertheless, it is used by colleges and universities in some beginning computer science classes. It is certainly capable of creating a metronome app, as you can see in my proof of concept: Scratchy the BeatBox Blues Cat.
To submit your answer, send me the link to your Scratch project.
Tip: It is OK to get started by looking at the code for my proof of concept Scratch project; however, just copying that code and submitting it with no or minimal changes won't earn you any points. My code has at least one major bug in it. If your code reproduces that bug, you will be penalized 50 points.
It's got a backbeat, you can't lose it.
—Rock and Roll Music by Chuck Berry
The backbeat is a feature of much popular music. In 4/4 time, the backbeat is a syncopated accent on beats 2 and 4 in each measure. For an extra 100 points, modify your metronome app by giving it an option to provide an audible or visible backbeat.
Your assignment for this challenge is to create a digital jukebox using Scratch or another programming language of your choice. Your solution should enable users to play a song on their computers, tablets, or smartphones. You must not violate anyone's copyright. In other words, you can't rip a CD of songs to MP3 format, and use those for your songs. (This doesn't apply if you are the copyright holder for those songs. Note that just owning a commercial CD doesn't make you the copyright holder for the songs on that CD.)
That means you are going to have to create your own songs. Here are some options for accomplishing that:
Earn 100 points for creating a digital jukebox with an attractive, user-friendly interface. Earn an additional 5 to 50 points for each song in your jukebox, up to a maximum of 20 songs.
Why is there such a range of point values per song? Computers play an important role in music production today, and this challenge encourages you to explore that role by using computers to create or edit your own music. For the purposes of this challenge, I am going to call anything that sounds vaguely musical a song. So if you program Scratch to play a single descending major scale that sounds like the opening melody of "Joy to the World," you could earn 5 points. If you used LMMS or GarageBand to create an original, polished composition, you could earn up to 50 points. Our panel of distinguished music critics will be the judge of what you get.
The NCAA men's Division 1 basketball tournament takes place in March, giving rise to the term March Madness. During March Madness, basketball fans often complete brackets that show the expected winners of each of the games played during the tournament. For this problem, you have to do two things:
*By purely mathematical odds, I mean the odds of picking a perfect bracket out of all of the possible outcomes, which is 1 in 9 quintillion, 223 quadrillion, 372 trillion, 36 billion, 854 million, 775 thousand, 8 hundred and 8. Because the outcome of these games is not a random event like the toss of a coin, the actual odds of picking a perfect bracket are higher than the purely mathematical odds. For example, the teams in the NCAA tournament are seeded from 1 to 16, and any bracket that has a 16th seeded team advancing past the first round is extremely unlikely. The four 16th seeded teams play the four number 1 seeded teams in the first round, and no 16 seed has ever beaten a 1 seed since the tournament was expanded to 64 teams in 1985. (Three 16 seeds have come close. In 1990, number 16 Murray State took number 1 Michigan State to overtime before losing 75-71. The year before, number 16 Princeton lost to number 1 Georgetown 50-49, and number 16 East Tennessee State University lost to number 1 Oklahoma 72-71. In both of these games, the number 16 seeds had chances to win in the closing seconds. Southwest Virginia's own Calvin Talford was ETSU's second leading scorer in the Oklahoma game.)
Pig is an example of what is called a jeopardy dice game. The rules are simple. Player A rolls a single die. If Player A rolls a 1, he gets 0 points for that turn and play passes to Player B. If Player A rolls something other than 1, that value is added to his turn. Then he has the option to roll again or pass. If he passes, play passes to Player B. If Player A decides to keep rolling, each roll is added to his total for the current turn, unless he rolls 1, in which case he loses all of his points for that turn. First player to reach 100 wins.
Pig poses interesting exercises in probability. The chances of earning points on each roll of the die are always 5 out of 6, while the chance of losing all of your points for that turn are 1 out of 6 on each roll. A superficial analysis would lead you to believe that you should never pass, but the problem is that your risk increases with each successful roll. On your first roll, you are risking 0 points, so you should never pass without rolling at least once. Say you roll 2 on your first roll. On your second roll, are you willing to risk that 2 for a 5/6 chance of getting more points? Mathematically speaking, you should. The chances that you will lose that 2 are only 1/6, and if you lose it, the chances of getting at least 2 points back on your next turn are 5/6. But what if you have had several successful rolls in a row, and you are up to 20 points for a particular turn? Are you willing to risk 20 on a single roll of the die? Your chances of keeping those 20 points are still 5/6, but you have to weigh the risk against the reward, which is a maximum of 6 points. You also have to take into account your opponent's score. Say you have 77 points and your opponent has 98 when you start a turn. In that case, you should keep rolling until you amass 23 points or roll a 1, because the probability of your opponent winning on his next turn is 5/6.
In 2004, two Gettysburg College computer scientists published an article in the UMAP Journal (Undergraduate Mathematics and Its Applications) that described the optimal play for Pig. Even though Pig is a very simple game, optimal play for it turns out to be "surprisingly complex and beyond human potential to memorize and apply," as the authors (Todd W. Neller and Clifton G.M. Presser) pointed out in a follow-up article in the UMAP Journal. The graph at the right is a visualization of Neller and Presser's optimization. Values within the gray solid represent the combinations of player score, opponent score, and current turn total for which a player should continue to roll.
For Problem 8, your assignment is to code a game in which a human player plays Pig against a software opponent.* For your game, assume that the human player goes first. Your program should accept input from the human player to indicate whether he wants to hold or roll again. You can use whatever method you choose to determine the software player's strategy. It could be as simple as always having the software player roll a set number of times per turn, or something more complex, based on its turn total (weighing risk against reward), or its turn total and the human player's current total. Since Neller and Presser have calculated the optimal Pig strategy, you could read their 2004 paper and code that strategy into your software player. Be forewarned: understanding that paper requires knowledge of linear algebra, and optimizing your software Pig player's strategy per Neller/Presser's method would be an extremely complex undertaking.
*This is actually an assignment from the Edhesive Introduction to Computer Science course that the Smyth County School Board is considering as a course offering for students beginning in the 2016-2017 school year.
This last problem is probably the hardest. I can't say for certain, because I haven't solved it myself. (I came up with all of the other problems, so I knew how difficult they would be before I published them.) Since this problem is probably going to take considerable effort to solve, I am going to release it early to give you maximum time to work on it.
Throughout history, warring armies and nations have devised various methods of keeping their battle plans and other communications secret. The Roman commander Julius Caesar used a simple substitution code to encrypt messages to his subordinates. A Caesar cipher just substitutes each letter of the plaintext message with a letter of the alphabet that is shifted a certain number of spaces to the left or right. For example, A could be replaced by D, B by E, C by F, and so on. Caesar ciphers are trivial to break, so cryptographers (people who create secret codes) developed more and more complicated ciphers down through the centuries.
During World War II, Germany used a mechanical cryptographic device called an Enigma machine to encrypt its military messages. The Enigma looked like an old-fashioned typewriter. When the operator typed a message, the machine produced an encoded version of the message, in which each letter was represented by a different substitution cipher. The genius of the Enigma was that its design prevented a codebreaker from using brute force attacks to decrypt its messages. (A brute force attack is one in which a codebreaker simply uses all the possible settings of a cipher to break a code. The military version of the Enigma had nearly 159 quintillion settings, and the Germans changed the settings every day, so it was not possible to use brute force methods against the Enigma.) Despite the complexity of the Enigma code, Allied codebreakers succeeded in cracking it. Alan Turing, the father of modern computer science, was one of the chief codebreakers responsible for breaking Enigma.
The Allied codebreakers were headquartered at an installation called Bletchley Park in England. Each day, they would receive a batch of Enigma-encrypted messages intercepted by Allied radio operators, and they would set to work on deciphering the messages. For the final challenge, consider yourself a member of the Bletchley Park team. Visit the web-based Enigma Cipher Challenge and use your codebreaking skills to decipher Message 1.
Earn 600 points by sending me the original German and an English translation of Message 1. (The messages are encoded in German.)
I haven't deciphered Message 1 yet, but I will share my strategy for approaching the problem:
Good luck! Check back here from time to time. If I come up with a more accurate translation of the contents of the German on the scrap of paper, I will post them here.
If trying to solve this problem causes you great mental stress, just remember the pressure the Bletchley Park codebreakers were under—millions of lives and the future of civilization hung in the balance when they were trying to crack Enigma.