-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type conversions refactorization #16
Conversation
…ch the base class.
…by nonrel query implementations).
…ther than get_db_prep_value for DecimalField.
…me more comments to NonrelQuery methods.
…rt_values_from_db) to allow symmetric handling of encoding and decoding.
…o not doing any type casting for AutoField if the latter is true.
…llow back-ends to override the main type (for example to easily use list storage for SetFields).
… fit better; added db_table argument.
…gic). Use <lookup info (column, type, operator, negated)>, <value>, <value info (starting with field)> argument order everywhere.
…e, so it can be used as a part of a key.
…) 3-tuple. Don't pass db_type to filtering-related methods (it was only used for encoding).
…e-String/Unicode) to plain strings for type-inspecting back-ends.
…irement is only that str(value) is reversible given the field kind.
…lanks in value_to_db_* functions.
…ary_key is usable and a feature to mark back-ends as supporing using fields other than AutoField as keys or not.
…er for subclasses and these fields should behave much like standard fields.
…es in a collection field.
…column used by the constraint (was: parent primary key, is: child foreign key).
…th a ListField more symmetric.
…lso allow the field to be used to prevent convert_value_to/from_db to be run on embedded instance fields' values like in the 0.4 version of the engine.
… as primary keys.
Conflicts: djangotoolbox/fields.py djangotoolbox/tests.py
…h by gpitfield, thanks!
…lit moved into separate methods.
… conversions / deconversions of embedded instance fields' values.
…ry for legacy GAE foreign key storage.
…edModelField implementation and no-deconversion handling to the LegacyEmbeddedModelField.
…umn keys (fields are already available for filtering and updates).
…umn keys (fields are already available for filtering and updates) and pass the key returned by back-ends through database deconversions.
…also skips deconversions.
…the-box (to be chosen by a back-end developer).
… lists nor strings nor bytes lets not call it a database ;-)
I will run your changes agains. Django-lms and see if it breaks anything (mongo). Thanks. This is really great. Toolbox really needed a refactor. |
Holy shit O_O This is insane. Thanks so much! I don't think we can review this easily, though. It would really help if you could squash your changes and then split them into logical bits. Also, please keep all sorts of style fixes (line breaks, whitespace, typos, etc) in one separate commit as to reduce noise. From a quick glance:
Also, you should probably try to get as many of these changes as possible into Django vanilla. Most of the code is database agnostic and it's better for Django-nonrel to have very little code that differs from Django. (You could do that as a GSoC for example.) Edit: We also need to know of any backwards incompatibilities :-) |
On Tue, 2012-02-07 at 07:07 -0800, Adam Cezar Jenkins wrote:
Thanks, let me know if something does break. Toolbox compilers
|
On Tue, 2012-02-07 at 07:59 -0800, Jonas Haag wrote:
Thank you for the good word.
Yes, all true. I like to keep such formatting stuff separate too (even
It now also allows to keep using the old logic for embedded instance
The pickling only happens if a back-end developer chooses "string" or
I have to attend to matters of a more commercial nature for the next
|
Closing for now, going to make a new request after reducing and bringing some order into the the commits list. Anybody would mind if I just PEP-ified the whole toolbox? I probably should just restrain from making all those style changes, but I can't make myself to :-) Some of PEP's recommendations may feel awkward, but having something fixed should make further changes more readable. |
I think PEP-ifying it is a great idea. Toolbox and nonrel code can be a bit confussing, anything that makes it more readable is better. It couldn't really hurt anything. |
i don't mind if you PEP the whole thing, but maybe do it as a separate commit so its easier confirm that nothing breaks. |
Yes exactly, for toolbox it's already done -- pepification and all insignificant things in one commit (not 100% strictly to the standard as sometimes it actually decreases readability, but close enough). Currently working on GAE, but I'll probably just drop any insignificant changes that have wandered into the Mongo branch -- I find it's style specific, but fairly consistent (and there are some more significant things to work on :-) |
I wanted to say a big thank you for this! It's made understanding and following the code paths much easier. |
This is something that started as an effort to understand how things work, add some comments to make the process faster for others to come, and maybe clean some things up. However, some additional goals have appeared on the way.
Flaws in type conversions
The first was to fill missing parts of type conversions:
This should fix a few issues, the ones I know about are the Mongo part of #10 and django-nonrel/mongodb-engine#47.
Change to converting / deconverting of embedded instance fields can be disabled by using LegacyEmbeddedModelField (now in toolbox) in case of incompatibility with existing code (hopefully this will be rare, see the comment on NonrelDatabaseOperations.value_for_db_model for some insight). [Update: Not any longer, we've decided to drop this completely.]
Data types abstraction
The second goal was to build data type framework more suitable for a "general" NoSQL database rather than just the GAE datastore:
db_type
names suitable for a database driver that can process Python types directly;"key"
db_type
, unless the back-end decides otherwise;nonrel_db_type
method that makes it possible to override Django'sdb_type
logic even for custom fields;sql.Query
-- values passed to back-ends compilers are now always accompanied by a proper field (my feeling is that the "column" logic is not very natural for NoSQL databases);Nonrel fields
The third goal was to make things easier for back-end developers, by taking the burden of handling toolbox fields away (there are many subtleties which can easily lead to mistakes). Now all the code is within the toolbox and all a back-end develop has to do is choose a suitable collection or embedded field
db_type
for his/her database and call a parent conversion (deconversion) method at the beginning (end) of the back-end methods. There is likely enough to choose from to satisfy most NoSQL databases (see comments in creation.py to learn the available options).GAE: relations as db.Keys and binary serialization
The fourth was to prepare grounds for storing
ForeignKeys
and other relations using the specialdb.Key
type of the GAE datastore. A database option is implemented to switch between the two storage methods, but only a "dump data; switch; reload data" procedure is possible. The default is to use the legacy storage for backwards compatibility, but it could be changed with some future major revision. See django-nonrel/djangoappengine#18 for more info.GAE engine now also serializes
DictFields
andEmbeddedModelFields
using the binary pickle protocol (instead of the ASCII protocol 0); this should work transparently as deconversion is the same for both protocols and no lookups / filtering is specified for these fields (as far as I can tell).Additional tests and fixes
EmbeddedModelField.update
issue (EmbeddedModelField pre_save, get_db_prep_value rework #14) is already applied [update: already on the develop branch];ListField
withForeignKeys
that we've discussed on the mailing list (marked as an expected failure for now) [update: on a separate branch now];Django changes to be considered
related_db_type
infrastructure -- it's not sufficient for GAE relations-as-Keys and rather hard to followvalue_to_db_*
andconvert_values
methods on DatabaseOperations with something like the newvalue_for/from_db(value, field [, lookup])
would allow to avoid double recursion for collection fields, simplify back-end code (get_db_prep_*
would call these methods rather than compilers), and make it possible to avoid various hacks in toolbox code (e.g. by moving what is now inField.get_db_prep_lookup
-- and is more or less reverted inNonrelQuery._normalize_lookup_value
-- to an abstract SQLDatabaseOperations). See comments on the NonrelDatabaseOperations class if you'd like to know what I find unnecessarily hard to do with the current framework.Reviewing, testing & merging
I believe the most convenient way to review these changes is by reading creation.py and base.py of toolbox and the engine that you use. If you'd prefer to go through some commits the best bet is to go through the Mongo ones -- only these have been squashed; otherwise you'll find a story of taking bad paths, learning new aspects of the system and redesigning things a couple of times. Any kind of feedback is welcome (including linguistic corrections, comments improvements, code style suggestions, pointing out errors, better design ideas, etc.).
These changes could use some testing. Particularly with preexisting data -- the intention was to keep everything compatible, and a considerable effort was put into understanding what each piece in current conversions does and not lose it, but considering the amount and nature of changes I've probably broken some things. So if you have a project with a testing environment and some existing data, trying these branches out would be really nice -- just backup, update and see if everything works. If it does and it's a GAE project, dump the data, add
STORE_RELATIONS_AS_DB_KEYS: True
to your database options, reload and check if it still does.I've run toolbox / appengine / dbindexer / django.contrib (except flatpages and GeoDjango) test cases for GAE under Python 2.5 / 2.7 and the Mongo test suite under 2.6 / 2.7 some horrible amount of times. However, if you have the power to run the full Django test suite (django/tests/runtests.py) in reasonable time -- run it -- I've given up for now (or otherwise would have to postpone pushing this for another week or so ;-) Some tests are expected to fail, but no new failures should be introduced compared to develop branches.
There are branches also called
feature/type-conversions-refactor
in the GAE and Mongo engine repositories. Each of them depends on this one and this one depends on them -- they need to be merged together.WARNING: Although this fixes some things it also changes how things are stored, in some cases in a subtle, dynamic way (e.g.
DictFields
on GAE). It really should not be used in production until going through some more extensive, real-data testing. Nevertheless, in case somebody feels that lucky (especially considering that this is not likely to be merged any time soon :-) dumping your data before and reloading it after the update gives a better chance to avoid some yet unforeseen problems and I'd also highly recommend looking into your database (e.g. at _ah/admin) and checking that things are stored as they should be shortly after that.