MediaWiki 1.11 title extraction bug

From Organic Design wiki
Legacy.svg Legacy: This article describes a concept that has been superseded in the course of ongoing development on the Organic Design wiki. Please do not develop this any further or base work on this concept, this is only useful for a historic record of work done. You may find a link to the currently used concept or function in this article, if not you can contact the author to find out what has taken the place of this legacy item.

The problem

There has been trouble upgrading to MediaWiki 1.11 and this article has been set up to document my investigation in to the problem. The symptom is that if $wgArticlePath is set to "/$1" then any long-form URL requests using title as a query-string parameter will fail and be redirected to a non-existent article called Wiki/index.php, i.e. it's treating the long-form URL as a friendly URL. (Bug 11428)

The cause

There have been some significant changes to the way the article title is extracted from the request in version 1.11. A new method called interpolateTitle has been added to the $wgRequest singleton object which is defined in includes/WebRequest.php and is called from includes/Setup.php. The problem has been isolated further into another new 1.11 method called extractTitle which is called from within the new interpolateTitle method and is shown below. This function returns an array of the key/value pairs which are then written back into $_GET and $_REQUEST so that they appear in the environment as if from a normal long-form URL request. The problem is that when $wgUsePathInfo is set to true (and we do want it to be true so that we can use un-encoded ampersands and question-marks in article titles) this function does not return the correct value for the title key in the returned array when title is a query-string item. I think it should be returning an empty array, or not getting called at all in the case of long-form URL's.

/**
 * Internal URL rewriting function; tries to extract page title and,
 * optionally, one other fixed parameter value from a URL path.
 *
 * @param string $path the URL path given from the client
 * @param array $bases one or more URLs, optionally with $1 at the end
 * @param string $key if provided, the matching key in $bases will be
 *        passed on as the value of this URL parameter
 * @return array of URL variables to interpolate; empty if no match
 */
private function extractTitle( $path, $bases, $key=false ) {
	foreach( (array)$bases as $keyValue => $base ) {
		// Find the part after $wgArticlePath
		$base = str_replace( '$1', '', $base );
		$baseLen = strlen( $base );
		if( substr( $path, 0, $baseLen ) == $base ) {
			$raw = substr( $path, $baseLen );
			if( $raw !== '' ) {
				$matches = array( 'title' => rawurldecode( $raw ) );
				if( $key ) {
					$matches[$key] = $keyValue;
				}
				return $matches;
			}
		}
	}
	return array();
}

WebRequest Patch

One thing the new 1.11 code shows is that the problem can only occur when the $wgUsePathInfo global is set to true, but setting this to false means that the mod-rewrite rules must translate all friendly requests to the full long-form query-string which means that un-encoded ampersands are translated as query-string separators and cannot be used in article titles.

I'm not sure what they're trying to do with the new function, so until they come up with a proper solution, I've just replaced the WebRequest constructor method with the one from 1.10, and made the interpolateTitle method return without doing anything. This allows the $wgUsePathInfo to work for /wiki/index.php/foo style requests (but incidentally, %, # and ? don't seem to have been working in any of our wikia for some time now!). Here's a snippet of the includes/WebRequest.php file with the patch applied:

class WebRequest {
	function __construct() {
		$this->checkMagicQuotes();
		# The rest of the code in this function is from MediaWiki 1.10
		global $wgUsePathInfo;
		if ( $wgUsePathInfo ) {
			if ( isset( $_SERVER['ORIG_PATH_INFO'] ) && $_SERVER['ORIG_PATH_INFO'] != '' ) {
				# Mangled PATH_INFO
				# http://bugs.php.net/bug.php?id=31892
				# Also reported when ini_get('cgi.fix_pathinfo')==false
				$_GET['title'] = $_REQUEST['title'] = substr( $_SERVER['ORIG_PATH_INFO'], 1 );
			} elseif ( isset( $_SERVER['PATH_INFO'] ) && ($_SERVER['PATH_INFO'] != '') && $wgUsePathInfo ) {
				$_GET['title'] = $_REQUEST['title'] = substr( $_SERVER['PATH_INFO'], 1 );
			}
		}
	}

	/**
	 * Check for title, action, and/or variant data in the URL
	 * and interpolate it into the GET variables.
	 * This should only be run after $wgContLang is available,
	 * as we may need the list of language variants to determine
	 * available variant URLs.
	 */
	function interpolateTitle() {
		return; # add this to disable the new 1.11 title extraction functionality
		global $wgUsePathInfo;