Setting Up A Modular Subversion Repository For PHP-Driven Websites
Willem Bogaerts - Kratz Business Solutions
Abstract
Sharing code between projects is still not a trivial matter with subversion. Especially if you are familiar with SourceSafe, you will find that subversion makes it hard to share code. Subversion seems to be really great in creating a version mess and good in solving one, but the reason I need source code control is to prevent such a mess. This is where subversion can be greatly improved, but it is not impossible. This howto will demonstrate a directory setup that takes the subversion sharing mechanism into account, as well as other issues that repositories bring.
Convention In This Howto
You will see <repository> in a lot of places. Replace this with the root of your repository. The root of your repository is a URL that usually starts with https://, file:/// or svn://.
It is assumed that you know what subversion is, that you know its basic use and that you have or can create a repository.
This howto does not give you "the best" way to organize a repository, because it depends on your needs. It aims to help by showing some of those decisions and how they affect the structure of the repository.
Subversion's Way Of Sharing Code
Directories in a working copy can contain links to other repositories by defining an svn:externals property. This will cause the linked repository directory to be included in your working copy, but will not make it part of the project itself. This means that you will see the directory with all files in your working copy, but not in the central repository. You can only see the property in the central repository.
There are a few drawbacks to this way of sharing:
- You can only give absolute URLs for the links, so seamlessly migrating a repository is near to impossible. Even switching protocols (svn:// to https:// for example) will bring broken working copies and a lot of trouble.
- You cannot link files, only directories. This will have a great impact on the organization of the code.
PHP Directory Issues
There are a few things to think about when working with directories on a website, and in particular PHP. As a matter of security, we do not want to put all sources in a directory that is accessible from a browser. The only files we willl put there are files that need to be called by a browser. I call these files "running" files as opposed to "defining" files that only have class definitions or function definitions inside them. Mixing running code and defining code in one file is generally not a good idea.
Also as a matter of security, many sites have a restricted area that contains unit tests, error log pages, or even a complete backoffice site. This restricted area is usually password-protected by the webserver.
So a project contains directories with defining files and a web root directory (often www/ or htdocs/) that contains running files and optionally a restricted area.
PHP And Relative Directories
Alas, PHP has a very counter-intuitive way of determining the location of an included file. The commands for inclusion of a file all work with respect to the first-called file, and not with respect to the current file. To make things even worse, the current file location is used when the original location does not lead to an existing file.
This sounds complicated, and it is, so here is an example:
Suppose you call a page "index.php". This page includes a page "library/functions.php". This in turn includes "settings.php". You would suspsect that "settings.php" is searched in the library directory, but it is not. It is searched in the same directory as "index.php"!
As said above, PHP continues to look in the expected directory if the file is not found. So everything seems to work as you expect it, until you encounter files with the same name in different directories. You then have a really hard time to find out why PHP "suddenly" picks the file from the wrong directory.
This means that you cannot safely include another file with relative paths. We must make them absolute with code like:
require_once(dirname(__FILE__) . '/library/functions.php');
Do not forget the '/' at the start of the relative path, as the dirname function returns paths without trailing slashes.
Organization Of Our Repository
There are a few things to consider for the project's code. For development, it is convenient to have the whole project checked out as a whole. But for a live server, this may not be what you want. You may want to check out database code (SQL scripts) in a directory that is far away from the web directories, maybe even on another server. There may be parts of the project that you do not want at all on a server, but are needed in development, like documentation files.
Furthermore, we want a central place to store the shared code in. We could theoretically "borrow" code from another project, but it would be really hard to follow which projects depend on which other projects. Instead, we will move any standard code to a central location.
Root Directories
The central location containing all standard code will be "<repository>/standard/". This directory will off course be subdivided into the standard libraries. Projects will be in the folder "<repository>/projects/", which can be subdivided by client and project, for example. Code from outside, like downloaded libraries such as PHPMailer or FPDF, will be put in "<repository>/external/".
Branches And Tags
It is good practise in a subversion repository to keep your code in a branch called "trunk" and create other branches at the same level if needed. Even if you do not want any branches yet, make a directory "trunk" directly below a project. Trunk is the active branch.
Think about this. When we link to trunk of a standard library, we link to the active branch. this means that all error corrections in the linked library will be updated whenever we update a working copy. But if we introduce an error, this will also be updated into our working copies. Even the one on a live web server! You may want to link to a more or less stable branch instead, but fixing errors will require a slight bit more overhead.
Whatever link you choose, it is good to know that you can always switch later.
What's In A Component Or Project
There are a lot of things we want to put in a repository, and we do not want everything in the same place on our web server. Some things are best not put on a web server at all or may be checked out to a different server, like a database server.
Note that the setup of a working copy on a development machine differs from the one on a server. On a development machine, you will probably have a central root directory for all your projects. This directory is then configured as accessible via your localhost web server, so you don't have to reconfigure your server for each project you are working on. On a live server, things must be configured anyway and security considerations cause us to only check out those files that are needed and nothing more. for a start, we make the following directories in each component (if needed):
- documentation/
- Customer input, database- and object schemes, etc. No need to put this on a web server, but it is very useful for developers.
- sql/
- Database creation, update and conversion scripts. for checkout on the database server.
- code/
- The actual application code
- test/
- Unit tests
These directories roughly appear the same in the projects, with the difference that unit tests should be runnable and are therefore part of the code section. If you do not want unit tests to be present on a live server, you can keep them apart. For projects, my setup will be:
- documentation/
- As above. Also contains references from used components.
- database/
- Contains database creation scripts and references from sql directories from used components.
- web/
- The defining application code.
- web/htdocs/
- The site root with the actual running application code, HTML pages and other web content, like stylesheets and images.
- web/htdocs/restricted/
- Restricted area.
- web/test/
- Unit tests. Contains references from the tests of the used components.
- selenium/
- Functional tests (see http://selenium.openqa.org/).
Running And Defining Code (Again)
We could put running code and defining code in separate directories, so that we could check out the running code into a separate directory inside the web root. However, this means that the subdirectory containing that code would become part of the URL (like www.example.com/restricted/errorhandling/viewerrorlog.php), and that may not be what you want (you may want www.example.com/restricted/viewerrorlog.php). There is a solution to this. We can put the running files in a non-accessible place and put a kind of proxy in an accessible place. This proxy is a PHP file with nothing more than an include or require statement that points to the file in the non-accesible location:
<?php // Errorlog, which is called <project dir>/htdocs/restricted/viewerrorlog.php // It is a proxy that points to <project dir>/errorhandling/viewerrorlog.php, which cannot be called directly by a browser. // (because it is outside the web root) require(dirname(__FILE__) . '/../../errorhandling/viewerrorlog.php'); ?>
The directory errorhandling is then shared from the standard library and contains "defining" class files to handle errors and a "running" file to view them.
Machine-Dependent Settings
Settings files are a bit unusual in a repository. You do want to have them in a repository, but you do not want them to get updated automatically. I use an in-between method for my settings files: I create a file called settings_example.php in the settings directory. This file contains all the settings with comments on how to set them for different machines. It also contains a comment that says you have to copy it to a file called "settings.php"
I only refer to settings.php from other files, and I set an svn:ignore property with the value of "settings.php" on the settings directory. This means a working copy will not run "out of the box", but it will not do so anyway due to the dependency on machine-dependent settings. The svn:ignore property prevents settings from different machines to overwrite yours on an update.
Putting It Into Practice
An example. Say we have a project that needs to send mail and create PDF files. We have decided that we use the PHPMailer from PEAR and the FPDF libraries. To avoid imposing any special restrictions in PHP setup, we just download the PHPMailer and do not use the "pear install" method to install it.
Also, this project will have standard error handling and a standard database class. For the sake of simplicity, I suppose you start with an empty repository.
Creating The Needed Directories
First, create the root directories described above: <repository>/external/, <repository>/standard/ and <repository>/projects/. I recommend using a graphical frontend to subversion, such as TortoiseSVN or RapidSVN.
We will deal with the external libraries first. Download and unpack PHPMailer and FPDF, and import them into <repository>/external/phpmailer/trunk/ and <repository>/external/fpdf/trunk/ respectively. If you like, you can create branches "firstdownload" from both of them.
Next, we create a project to work on. This project is for customer "CustomerInc" and the project is named "SamplePrj". I'm positively sure that you can come up with better names for your projects. So we create the full path of <repository>/projects/CustomerInc/SamplePrj/trunk/. The "trunk" is the main branch: you may be working in other (development) branches, but they are eventualy merged to trunk. If you are working on the trunk (and why wouldn't you if it is a new project), this directory is the directory to check out as a working copy. But let's wait with that. Under the trunk, We create "purpose" directories for web code (php, html, etc.), database code and documentation.
Under the web directory, I created a web root (htdocs) and a directory for generic project-specific php files.
Creating The Shares
Until now, I did everything on the repository directly. But to share code, we have to make a working copy first. So let's do that.
Because our localhost webserver has to be able to reach all our project code, I make my working copy on a root folder of my filesystem (/projects/<username>/<projectname>) or harddisk (C:\projects\<username>\<projectname>). Otherwise, the webserver needs to have access rights in the home directory of all users that use subversion. By using a root folder, you can ensure that that the webserver can access them all, and you can even separate directories for different users. For unix-based systems, all subversion users should be member of your webserver's group, and the total projects directory should be at least group-readable. At least the web stuff, that is.
In our working copy, we have the directory web, where the htdocs directory is located and where we want the links to FPDF and PHPMailer. To create those links, add a subversion property svn:external to the web directory with the following value:
fpdf <repository>/external/fpdf/trunk/code phpmailer <repository>/external/phpmailer/trunk/code
(yes, the value consists of 2 lines) and then update your working copy. You should now get two extra directories with your update, as shown in this screenshot of a working copy of our project.
Note: I created the links to trunk. You may link to a branch instead or link to a specific revision number. I created only external references to the code sections. You would probably want to create external references to the documentation of the external libraries as well.
Moving Useful Code To The Standard Library
As your projects mature, you develop more and more code that would do good in other projects. In our case, let's say that we have developed an error handling package and a database class. Off course, the first step is to remove any project dependencies from this code. Put the code to share (let's call it a package) in a separate directory, as it must be in a separate directory when it is shared back from the standard library. So we make sure that the generic error handling code is in a directory errorhandling and the database class is in a directory database, both directly under the web directory.
You can use the svn mkdir and the svn move commands or any graphical subversion client to move the package in your project to <repository>/standard/<package_name>/trunk/code. Don't forget to update your development working copy after this (you will see the package directory disappear), then add two lines to the svn:external property (of the form <package_name> <repository>/standard/<package_name>/trunk/code), update (to get the directory back, now from the standard library) and commit the property change. The most problems are in updating your working copies, especially if that working copy is a live website for which you want as little downtime as possible.
If you update your working copies after each change on the server, nothing can go wrong. The problem rises when you have a working copy that still has the package on the original project location and you want to update to a state where that same directory now comes from an svn:externals reference. If you try that, you will get an error saying that the target directory for that external reference already exists, so the external reference cannot be created. This is easily solved by removing that directory manually before updating that working copy.
Move any unit tests, documentation, SQL scripts, etc. in a similar manner to the standard library.
Updating The Live Server
When more than one person has access to the web or database server, it is better to use one account for all of them. This ensures that the .svn directories do not suffer from conflicting account settings and user rights. Furthermore, you can deny that account write permissions on the repository, so a compromised webserver cannot easily commit nasty stuff.
On a Linux web server, the following rights make sense: read and write for the user that updates the "live" working copy, read rights for the group (which is the web server's group) and no rights for others. If you set the SGID bit on the directories, all added files will also be accessible to the web server. You can set the SGID bit with chmod g+s <directory>. Consider setting the umask to 0027 to set the permissions right for added files also. A freshly added htdocs directory may then look like:
drwxr-s--- 9 webdev abyssd 4096 2007-09-08 12:42 restricted/
-rw-r----- 1 webdev abyssd 441 2007-09-08 12:42 index.php
-rw-r----- 1 webdev abyssd 2954 2007-09-08 12:42 main.css
Modular Database Code
We already discussed how includes work in PHP, but how do we include files in SQL? We need to be able to include different files from different directories in SQL code, as we organized our repository this way. If we share PHP code, we should share the corresponding SQL code as well. I wrote a little PHP and a little python script to do that. You can find them at http://www.w-p.dds.nl/sqlincludeparser_php.txt (PHP) and http://www.w-p.dds.nl/sqlincludeparser_py.txt (Python). These scripts allows you to define include statements in a base file of the form:
CREATE DATABASE IF NOT EXISTS someDatabase; USE someDatabase; -- @include(errorhandling/errortables.sql) # (re)creates tables used by the errorhandling package DROP TABLE IF EXISTS someTable; CREATE TABLE someTable (someID INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, ...etc.
If you make every included file and the base file repeatable, you can use it to get to a clean state over and over again, which can be very useful for testing and developing. To use it, just pipe the output of my script to the sql command-line client:
sqlincludeparser.py <base file location> | mysql -u <administrator user> -p [<other mysql options>]