Not long ago, my team received news of the repository that we'll be working on. That's good new, until we encountered a troubling bug in one of the integration tests. We call it - the phantom bug. Why do we call it a phantom bug? Because sometimes the test case fails and sometimes it just passes like magic. Here's how the phantom bug looks like:
After some time, we found a small pattern: the test seems to fail only after we have run it more than twice. At first, we thought it might have been caused by some file corruption between the test runs. Therefore, I decided to test this hypothesis by using Git to track all my files between every test run. Unfortunately, this was not the cause of the problem as none of my local files changed between the test runs. Then, I thought that the bug might be related to the database, so I tried deleting the database between every test run. However, the phantom bug continued to appear and disappear. So what could it be, if it is neither related to the local files nor the local database?
I should have known - the cause is something remote, and I found this only after several hours of debugging:
It seems like the Github API has a rate limit set for each IP address. Therefore, after two test runs, my IP address would have exhausted its rate limit and the API would fail to return the desired object in subsequent runs, causing the error that we see in the integration test. The solution to this issue is to use Github's OAuth to authenticate the request so that we may enjoy a higher rate limit. However, at this point of time, we do not have the authorization for this. Therefore, we would have to bear with this issue until we receive the authentication tokens. For now, at least we have solved the mystery of the phantom bug and in the process, developed a greater understanding of the application architecture.
+ ../scripts/run_tests.sh + set -e + echo 'Running Tests' Running Tests + [[ -n '' ]] + bin/phpunit -c app/ --verbose --coverage-clover build/logs/clover.xml --coverage-html=coverage/ PHPUnit 4.8.14 by Sebastian Bergmann and contributors. Runtime: PHP 5.5.9-1ubuntu4.14 with Xdebug 2.4.0RC3 Configuration: /opt/codebender/eratosthenes/Symfony/app/phpunit.xml.dist .........I.....I.PHP Fatal error: Call to a member function getOwner() on a non-object in /opt/codebender/eratosthenes/Symfony/src/Codebender/LibraryBundle/Tests/Controller/ViewsControllerFunctionalTest.php on line 132 PHP Stack trace: PHP 1. {main}() /opt/codebender/eratosthenes/Symfony/vendor/phpunit/phpunit/phpunit:0 PHP 2. PHPUnit_TextUI_Command::main() /opt/codebender/eratosthenes/Symfony/vendor/phpunit/phpunit/phpunit:47 PHP 3. PHPUnit_TextUI_Command->run() /opt/codebender/eratosthenes/Symfony/vendor/phpunit/phpunit/src/TextUI/Command.php:100 PHP 4. PHPUnit_TextUI_TestRunner->doRun() /opt/codebender/eratosthenes/Symfony/vendor/phpunit/phpunit/src/TextUI/Command.php:149 PHP 5. PHPUnit_Framework_TestSuite->run() /opt/codebender/eratosthenes/Symfony/vendor/phpunit/phpunit/src/TextUI/TestRunner.php:440 PHP 6. PHPUnit_Framework_TestSuite->run() /opt/codebender/eratosthenes/Symfony/vendor/phpunit/phpunit/src/Framework/TestSuite.php:747 PHP 7. PHPUnit_Framework_TestCase->run() /opt/codebender/eratosthenes/Symfony/vendor/phpunit/phpunit/src/Framework/TestSuite.php:747 PHP 8. PHPUnit_Framework_TestResult->run() /opt/codebender/eratosthenes/Symfony/vendor/phpunit/phpunit/src/Framework/TestCase.php:724 PHP 9. PHPUnit_Framework_TestCase->runBare() /opt/codebender/eratosthenes/Symfony/vendor/phpunit/phpunit/src/Framework/TestResult.php:612 PHP 10. PHPUnit_Framework_TestCase->runTest() /opt/codebender/eratosthenes/Symfony/vendor/phpunit/phpunit/src/Framework/TestCase.php:768 PHP 11. ReflectionMethod->invokeArgs() /opt/codebender/eratosthenes/Symfony/vendor/phpunit/phpunit/src/Framework/TestCase.php:909 PHP 12. Codebender\LibraryBundle\Tests\Controller\ViewsControllerFunctionalTest->testAddGitLibrary() /opt/codebender/eratosthenes/Symfony/vendor/phpunit/phpunit/src/Framework/TestCase.php:909
After some time, we found a small pattern: the test seems to fail only after we have run it more than twice. At first, we thought it might have been caused by some file corruption between the test runs. Therefore, I decided to test this hypothesis by using Git to track all my files between every test run. Unfortunately, this was not the cause of the problem as none of my local files changed between the test runs. Then, I thought that the bug might be related to the database, so I tried deleting the database between every test run. However, the phantom bug continued to appear and disappear. So what could it be, if it is neither related to the local files nor the local database?
I should have known - the cause is something remote, and I found this only after several hours of debugging:
$ curl https://api.github.com/repos/codebendercc/webserial/git/refs/heads?client_id=&client_secret= -A "Eratosthenes" $ { "message": "API rate limit exceeded for XXX.XXX.XXX.XXX. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)", "documentation_url": "https://developer.github.com/v3/#rate-limiting" }
It seems like the Github API has a rate limit set for each IP address. Therefore, after two test runs, my IP address would have exhausted its rate limit and the API would fail to return the desired object in subsequent runs, causing the error that we see in the integration test. The solution to this issue is to use Github's OAuth to authenticate the request so that we may enjoy a higher rate limit. However, at this point of time, we do not have the authorization for this. Therefore, we would have to bear with this issue until we receive the authentication tokens. For now, at least we have solved the mystery of the phantom bug and in the process, developed a greater understanding of the application architecture.
Comments
Post a Comment