Development/ReplicationSupport

Replication Support

A discussion of implementation of replication support in Propel 1.3

Current Implementation

Configuration

Propel configuration parsing will look for the presence of <slave> connection configurations in the runtime configuration file. If they are present, the ReplicationPDO class will be used (instead of the default, PropelPDO).

Usage of Slaves

ReplicationPDO delegates read queries (SELECT statements) to a *random* slave. The slave connection is only actually initialized when requested. The SlavePDO class is used for the slave connections (when they're initialized); this class overrides methods to ensure that it is only performing read queries.

PropelPDO

It is also possible to override the PDO subclass by specifying a <classname> within the <connection> for the master connection or the slave connections. It is worth noting that the generated code requires a PropelPDO object (for nested transaction support), so the custom class will have to subclass PropelPDO.

Discussion

Load balancing when using multiple slaves

Is there a reason to use random connections for every SELECT query?

[christian] Load balancing is done by picking a random slave. [hans] I understand now (see below)

Or should we iterate over slaves (if more than one)?

[christian] That would be a better solution, of course. [hans] (see below) I misunderstood when reading the code & thought that they were being picked every time. Now that I understand, this system makes fine sense.

Maybe compromise would be to start with a random slave and then loop over them. I don't know that calling rand() for every SELECT statement is a big performance penalty, but it doesn't seem entirely necessary.

[christian] The random pick should only take place at the beginning of a request. All subsequent slave-queries should use that connection to avoid overhead. The slave should only change from one request to another. [hans] Yeah, I understand now.

Reod only for Slaves

Is it strictly necessary to enforce READ-only in the SlavePDO? i.e. if a slave object is explicitly asked to perform an update, should this be allowed? I guess my concern is that by checking the SQL for 'SELECT' (but not 'SELECT INTO'), we are not comprehensively eliminating update statements --- e.g. "select sp_insert_into_my_table('foo', 'bar')". I don't think there's any good way for the ReplicationPDO to be able to guess, based on the SQL, which connection should be used. For that reason, I like Mortiz's suggestion of having this be explicitly requested as part of the getConnection() call -- i.e. the doSelect() methods know they can use a slave, other methods will use a master. When fetching your own connection, the default could be master, but you could also fetch a slave connection if you know that you're performing a SQL query. Putting this into the hands of the programmer, is probably wise, no?

[christian] I totally agree - it would be better to choose the right connection type on the doSelect() function. Parsing the SQL statement ist too unsafe as you've shown. But: that would change the OM generation, wouldn't it. [hans] Ok -- I think that changing the generated code in this case is OK, since it's not really going to break the API. The default will always be WRITE (master) anyway.

Default connection

From Mortiz's description below, is it safe to assume that for systems that don't care about replication Propel::getConnection() would always be returning a master connection -- regardless of whether a slave was requested?

[christian] yeah [hans] Ok, good. I think this will work ...