From jarl at gavia.dk Wed Oct 1 07:55:11 2008 From: jarl at gavia.dk (Jarl Friis) Date: Wed, 01 Oct 2008 13:55:11 +0200 Subject: [ruby-dbi-users] Proposing patch for sqlite3 driver In-Reply-To: <200809300748.35952.erik@hollensbe.org> (Erik Hollensbe's message of "Tue, 30 Sep 2008 07:48:35 -0700") References: <200809292242.39040.erik@hollensbe.org> <200809300748.35952.erik@hollensbe.org> Message-ID: Erik Hollensbe writes: > The problem with this is that while postgres's date format is variable, the > parsing routes we use with DateTime are not. A smarter solution needs to be > used, as a "strptime until it doesn't break" solution will likely cause more > problems than it resolves. I am not sure on what you mean by "strptime until it doesn't break", and I wonder what problems you have in mind when you say "will likely cause more problems than it resolves." Anyway I think that my proposal is more robust than current implementation, because: 1) most installations/configurations will accept ISO dates. 2) dates formated in ISO8601 are not unambigous. 3) The current solution silently create corrupt data if you pg is configured to use dmy. As you mention my solution still requires the configuration to accept iso formats, but it will crash early[1] if pgsql is not configured so, which is good. This is not the case with current DBD:Pg So all-in-all I consider my proposal at least as good as the current implementation. Yet I agree that it could be nice with an even better solution. The low-level pg-driver accepts a DateTime, could that be used in stead of handing over a string? Or should we do such thing as to_timestamp('2008-09-30 12:34:34.123456','YYYY-MM-DD HH:MM:SS.US'); when writing to the database? In that case we still need to do something when reading from the database. Jarl Footnotes: [1] http://www.pragmaticprogrammer.com/the-pragmatic-programmer/extracts/tips From erik at hollensbe.org Wed Oct 1 16:59:26 2008 From: erik at hollensbe.org (erik at hollensbe.org) Date: Wed, 01 Oct 2008 13:59:26 -0700 Subject: [ruby-dbi-users] Proposing patch for sqlite3 driver In-Reply-To: References: <200809292242.39040.erik@hollensbe.org> <200809300748.35952.erik@hollensbe.org> Message-ID: <95c8612ff811e2a27e44618122df4954@localhost> On Wed, 01 Oct 2008 13:55:11 +0200, Jarl Friis wrote: > Erik Hollensbe writes: > >> The problem with this is that while postgres's date format is variable, > the >> parsing routes we use with DateTime are not. A smarter solution needs to > be >> used, as a "strptime until it doesn't break" solution will likely cause > more >> problems than it resolves. > > I am not sure on what you mean by "strptime until it doesn't break", > and I wonder what problems you have in mind when you say "will likely > cause more problems than it resolves." I'll explain this below. > Anyway I think that my proposal is more robust than > current implementation, because: > 1) most installations/configurations will accept ISO dates. > 2) dates formated in ISO8601 are not unambigous. > 3) The current solution silently create corrupt data if you pg is > configured to use dmy. > > As you mention my solution still requires the configuration to accept > iso formats, but it will crash early[1] if pgsql is not configured so, > which is good. This is not the case with current DBD:Pg If you s/ISO/DMY/g or s/ISO/MDY/g, you have the same exact problem with a different date format. DBI is all about doing it the way you want in your code, not the way I want you to, and changing the date format to something just as fixed and arbitrary when reality has already confirmed that is not a sound solution. This is no more robust than the current solution that is currently in place, which was based off similar assumptions you're echoing here. I guess what I'm saying is, if we do it your way, there will be a bug next week with a DMY=only guy having the same problem, and considering my current availability I can't afford that kind of busywork. I would prefer to do it right the first time, where the format is generated based on what the database it's connected to supports. Doing this is currently non-trivial, the inbound conversion maps have other issues that need to be addressed in a future version of DBI for other reasons. However, nothing is stopping you from passing a string yourself (which could be the result of DateTime.now.strptime if you want) or even overriding the existing conversion map. Both of these are documented liberally. I also have serious doubts that it does this silently, as postgres has typically been *very* anal about getting the format wrong in the past, but I have no ability to test it at this moment. > So all-in-all I consider my proposal at least as good as the current > implementation. Yet I agree that it could be nice with an even better > solution. The low-level pg-driver accepts a DateTime, could that be used > in stead of handing over a string? Or should we do such thing as > to_timestamp('2008-09-30 12:34:34.123456','YYYY-MM-DD HH:MM:SS.US'); > when writing to the database? In that case we still need to do > something when reading from the database. The problem you have identified is spot on, but your solution suits you and not the general case. While I can understand your concern: 1) Nothing is preventing you from working around it at this time. 2) Changing the format without changing the logic serves a different subset of people, it doesn't fix the problem. 3) I have, at this point, integrated and reverted several "it doesn't work my way so here's a patch to fix it" patches that have gone to great lengths to waste my time and embarrass me as I deal with frustrated people that I just broke everything for when it was less specific. Now that you understand my reasons for rejecting this idea, please be clear that: 1) Type management is new in 0.4.0. Anything (yes, *anything*) other than a string that worked before was quite literally non-deterministic and could be easily described as "working on accident". 2) Type management is new, there are going to be bugs, and I can count the number of people on two fingers that went out of their way to test and report bugs with the version I advertised the alpha version before release (one of them is me). 3) Nothing is stopping you from completely turning it off or just passing a string that it would be coerced to anyways. Now, I don't want to come off as bitter, because I'm really not, but I don't think it's reasonable to expect me to handle this in the manner you'd currently prefer. My workload is very high this week and I promise, as soon as I have free time, I will focus my efforts on resolving this. Thanks again for your time and comments, -Erik From kubo at jiubao.org Wed Oct 1 20:54:10 2008 From: kubo at jiubao.org (KUBO Takehiro) Date: Thu, 2 Oct 2008 09:54:10 +0900 Subject: [ruby-dbi-users] Proposing patch for sqlite3 driver In-Reply-To: References: <200809292242.39040.erik@hollensbe.org> <200809300748.35952.erik@hollensbe.org> Message-ID: <5d847bcd0810011754g289c4c76ld6f7cf2632ba9c5e@mail.gmail.com> Hi, On Wed, Oct 1, 2008 at 8:55 PM, Jarl Friis wrote: > So all-in-all I consider my proposal at least as good as the current > implementation. Yet I agree that it could be nice with an even better > solution. The low-level pg-driver accepts a DateTime, could that be used How about the following solution? 1. execute 'SHOW DateStyle;' just after establishing a connection. 2. scan the result to check the date style. - ISO - SQL - PostgreSQL - German 3. If the date style is "SQL" or "PostgreSQL", scan it again to check substyle - European - NonEuropean or US 4. make strftime formats according to the date style and substyle and set them to the connection as instance variables. 5. convert Datetime, Date and Time to string by the strftime formats. http://www.postgresql.org/docs/7.3/static/sql-show.html http://www.postgresql.org/docs/7.3/static/sql-set.html From erik at hollensbe.org Thu Oct 2 02:30:57 2008 From: erik at hollensbe.org (Erik Hollensbe) Date: Wed, 1 Oct 2008 23:30:57 -0700 Subject: [ruby-dbi-users] Proposing patch for sqlite3 driver (very, very long) In-Reply-To: <5d847bcd0810011754g289c4c76ld6f7cf2632ba9c5e@mail.gmail.com> References: <5d847bcd0810011754g289c4c76ld6f7cf2632ba9c5e@mail.gmail.com> Message-ID: <200810012330.57787.erik@hollensbe.org> On Wednesday 01 October 2008 17:54:10 KUBO Takehiro wrote: > Hi, > > On Wed, Oct 1, 2008 at 8:55 PM, Jarl Friis wrote: > > So all-in-all I consider my proposal at least as good as the current > > implementation. Yet I agree that it could be nice with an even better > > solution. The low-level pg-driver accepts a DateTime, could that be used > > How about the following solution? > 1. execute 'SHOW DateStyle;' just after establishing a connection. > 2. scan the result to check the date style. > - ISO > - SQL > - PostgreSQL > - German > 3. If the date style is "SQL" or "PostgreSQL", scan it again to check > substyle - European > - NonEuropean or US > 4. make strftime formats according to the date style and substyle and > set them to the > connection as instance variables. > 5. convert Datetime, Date and Time to string by the strftime formats. This is very close to what I was thinking, but I didn't have the specifics down. Thanks, Kubo. The problem is still a problem however, and that's due to how coarse DBI's handling of inbound parameter conversion (dbh.do, sth.execute, etc) is done. I knew this was a less than optimal solution, but I didn't realize it would be such an impediment for issues like this. It looks like DBI 0.4.1 needs to be released with a solution to this problem. The rest of this email is damn long; a full description of what I think the fundamental issues are and some solutions... It's probably not an easy read even for those interested, and there's no code; this is mostly brainstorming at this point. I hope to code up a prototype in the next day or two to further demonstrate this. However, I would appreciate all feedback on the subject. The problem is threefold: 1) Part of the benefit of the conversion map is that it can be redefined for user-specific needs. Redefining it at this point would potentially clobber any user-defined modifications. 2) Inbound conversion maps (outbound is significantly different and actually depends on the data being returned, in combination with the mappings between database type and ruby type) are static and connection agnostic. Therefore, while unlikely, the possibility that in this scenario a conversion would get overwritten on a second connection to a second database with a different date format. 3) This one was evidenced to me last night by the mysql bug reported by Georgios Moschovitis; He was using Time.now to represent a full timestamp. While Time can do this nearly as well as DateTime (or better, depending on who you talk to and what you need), DateTime does not inherit from Time, and the rather rudimentary solution I'm using to determine what goes in causes him to get only %H:%M:%S instead of the full timestamp. What I'd like to do at this point is discuss a few options. In RFC parlance, here are what I consider the MUST's: 1) Types are not only polymorphic between DBDs, but between database connections within the DBD 2) A ruby type may represent multiple database types 3) No limitation should be placed on the user to use the predefined type; the types should merely make the mundane possible and not limit the user from doing the extravagant with their own custom types. I'm specifically thinking of postgres, but in reality there are tons of specialized types (and probably a few standard ones) we do not support among all the known supported databases. Before I go into possible solutions, I should note that nothing is set in stone, feel free to reject all these ideas and propose your own solution, just keep in mind the 3 things above and that solutions have to serve a multitude of consumers, not a specific subset of them; DBI is deliberately defined to mandate as little as possible to allow people to work as they feel appropriate. Now, for what I consider as the SHOULD's (again, RFC parlance): 1) It would be ideal if inbound (currently the majority of DBI::TypeUtil) and the outbound (DBI::Type::*) were unified, if only in namespace and API. This would make a number of external interfaces much easier to handle. 2) Binding should be able to take extensive advantage of any approach; one thing that has been requested a number of times is outbound parameter binding, and part of that would have to take advantage of the type system 3) It should not be rocket science to write a type handling class, even if the standard ones are necessarily complex. Now that the bulleted lists are gone... Here's an idea. To summarize (ok, I promise, last list): - DBI::Type contains an interface inherited from by all types, which contains methods intended to dispatch to the appropriate conversion based on ColumnInfo data. These conversions cover both inbound and outbound conversions. - DBI supplies baseline types and expects DBDs to provide any variants from those baseline types, and those variants should inherit from their DBI counterparts, if any, or at worst DBI::Type itself. - dbh holds a conversion map of database types to DBI::Type::* classes. This map should be mutable by the end user. The long-winded version: The DBI::Type::* namespace could be extended by DBD, ergo, DBI::Type::Timestamp could be subclassed as DBI::Type::MySQL::Timestamp, and it would inherit from a base DBI::Type::Integer implementation, similar to how almost all of the DBI::Type classes currently inherit from DBI::Type::Null. These classes would be the default type classes for each DBD, would be defined *with* the DBD (with the base parent classes provided by DBI as a fallback). These types would be mapped to the database handle at connection time, and be mutable; the types will be defined in the DBD only for isolation and maintenance purposes. These classes, if not intended to fall back to their parent, would implement at least two class methods, #to_type and #from_type, which would convert to a ruby type and from it to a database-compatible representation, respectively. These would be generic (in the context of the DBD) routines intended for execution only if a more specific conversion fails to exist. DBI::Type, which all DBI::Type::* classes would inherit from, would implement #parse and #coerce, which would act as dispatchers. In #parse's case, it would take both the object and the corresponding ColumnInfo object for the column; the type_name (which I'm fairly certain is provided by all DBDs) would be used as a method name to be called on for a specific conversion of the object, otherwise, the fallback would be used (in DBI::Type::Mysql::Integer's case this would be D::T::M::Integer.to_type, or more likely D::T::Integer.to_type, as mysql's integers aren't anything special). To get more specific, DBI::Type::Pg::Timestamp could overload #parse to delegate to 'dmy', 'mdy', 'iso' and so methods, which would properly parse into a DateTime object. this data could be gathered with the ColumnInfo per-execute (allowing for situations where it's been changed) or on connection. Likewise, other Timestamp classes could use the base #parse to delegate to methods that handle types with the timezone and without, which are fairly common, and even handle integer types, for those who prefer to store epochs and whatnot. #coerce would be similar to this in function, but getting the information used to delegate would be different; the obvious problem being that we don't have the information at the point where coercion needs to happen, or at least, most of it. The dbh mapping will need to contain an optional default conversion that is used for #coerce, and a convenient way for users to alter that conversion from the pre-packaged ones. cloning the class to something anonymous w/ a change to the default conversion should work, this can be done in DBI::Type as well. I'll try and have prototypes for this sometime tomorrow to clarify confusion, but please, poke/ask/criticize/flame, especially if there are faults that I haven't accounted for. Understanding how the DBI::Type namespace currently works will greatly help your comprehension; it's documented. -Erik From erik at hollensbe.org Thu Oct 2 22:50:46 2008 From: erik at hollensbe.org (Erik Hollensbe) Date: Thu, 2 Oct 2008 19:50:46 -0700 Subject: [ruby-dbi-users] Proposing patch for sqlite3 driver (very, very long) In-Reply-To: <200810012330.57787.erik@hollensbe.org> References: <5d847bcd0810011754g289c4c76ld6f7cf2632ba9c5e@mail.gmail.com> <200810012330.57787.erik@hollensbe.org> Message-ID: <200810021950.46760.erik@hollensbe.org> On Wednesday 01 October 2008 23:30:57 Erik Hollensbe wrote: > On Wednesday 01 October 2008 17:54:10 KUBO Takehiro wrote: > > Hi, > > > > On Wed, Oct 1, 2008 at 8:55 PM, Jarl Friis wrote: > > > So all-in-all I consider my proposal at least as good as the current > > > implementation. Yet I agree that it could be nice with an even better > > > solution. The low-level pg-driver accepts a DateTime, could that be > > > used > > > > How about the following solution? > > 1. execute 'SHOW DateStyle;' just after establishing a connection. > The problem is still a problem however, and that's due to how coarse DBI's > handling of inbound parameter conversion (dbh.do, sth.execute, etc) is > done. I knew this was a less than optimal solution, but I didn't realize it > would be such an impediment for issues like this. It looks like DBI 0.4.1 > needs to be released with a solution to this problem. I know I promised a prototype tonight, but being realistic I just don't think it's going to happen. Tomorrow is Friday and I imagine I can get something for review either tomorrow night or Saturday sometime. I'm going to post a news item on the RF page right now explaining the current problem with DBD::Pg and solutions to avoid it. I still want comments, or even full designs that solve these issues; I'm happy to consider them with equivalent weight against my own proposals, and would love to hear from uninvolved community members if they have opinions on what strategies would work best for them. -Erik From erik at hollensbe.org Sun Oct 5 16:14:34 2008 From: erik at hollensbe.org (Erik Hollensbe) Date: Sun, 5 Oct 2008 13:14:34 -0700 Subject: [ruby-dbi-users] Proposing patch for sqlite3 driver (very, very long) In-Reply-To: <200810021950.46760.erik@hollensbe.org> References: <200810012330.57787.erik@hollensbe.org> <200810021950.46760.erik@hollensbe.org> Message-ID: <200810051314.34893.erik@hollensbe.org> On Thursday 02 October 2008 19:50:46 Erik Hollensbe wrote: > On Wednesday 01 October 2008 23:30:57 Erik Hollensbe wrote: > > On Wednesday 01 October 2008 17:54:10 KUBO Takehiro wrote: > > > Hi, > > > > > > On Wed, Oct 1, 2008 at 8:55 PM, Jarl Friis wrote: > > > > So all-in-all I consider my proposal at least as good as the current > > > > implementation. Yet I agree that it could be nice with an even better > > > > solution. The low-level pg-driver accepts a DateTime, could that be > > > > used > > > > > > How about the following solution? > > > 1. execute 'SHOW DateStyle;' just after establishing a connection. > > > > The problem is still a problem however, and that's due to how coarse > > DBI's handling of inbound parameter conversion (dbh.do, sth.execute, etc) > > is done. I knew this was a less than optimal solution, but I didn't > > realize it would be such an impediment for issues like this. It looks > > like DBI 0.4.1 needs to be released with a solution to this problem. > > I know I promised a prototype tonight, but being realistic I just don't A day late and a dollar short: http://gist.github.com/14922 I think this will handle the problem.... note that from/to relates to ruby, not the database, so "from_type" is from a ruby type, not from Pg. The database handle mock is pretty loose, but I hope the "real world example" at the bottom will clear things up a bit, demonstrating how these things would happen inside DBI. I didn't think it was worth filling out a complete mock of DBI to demonstrate this. :) Anything (really, anything!) comment-wise would be very helpful here. I'm not interested in just getting yelled at when this does something people don't like, let's preempt this, please. Ideally, a clone/fork of this with your specific comments in edits (unless they amount to a general "it sucks", which is fine as well) would be helpful. -Erik From jarl at gavia.dk Mon Oct 6 05:24:45 2008 From: jarl at gavia.dk (Jarl Friis) Date: Mon, 06 Oct 2008 11:24:45 +0200 Subject: [ruby-dbi-users] Proposing patch for sqlite3 driver In-Reply-To: <95c8612ff811e2a27e44618122df4954@localhost> (erik@hollensbe.org's message of "Wed, 01 Oct 2008 13:59:26 -0700") References: <200809292242.39040.erik@hollensbe.org> <200809300748.35952.erik@hollensbe.org> <95c8612ff811e2a27e44618122df4954@localhost> Message-ID: writes: > I guess what I'm saying is, if we do it your way, there will be a bug next > week with a DMY=only guy having the same problem, I see your point, and agree that a universal solution is the best thing. From jarl at gavia.dk Mon Oct 6 05:37:12 2008 From: jarl at gavia.dk (Jarl Friis) Date: Mon, 06 Oct 2008 11:37:12 +0200 Subject: [ruby-dbi-users] DBD:Pg requires postgresql to be configured to datestyle=mdy (WAS: Proposing patch for sqlite3 driver (very, very long)) In-Reply-To: <200810012330.57787.erik@hollensbe.org> (Erik Hollensbe's message of "Wed, 1 Oct 2008 23:30:57 -0700") References: <5d847bcd0810011754g289c4c76ld6f7cf2632ba9c5e@mail.gmail.com> <200810012330.57787.erik@hollensbe.org> Message-ID: Dear Erik. Thanks for a very long explanation of what you have in your mind. It will take some time for me to understand your ideas (I am new to ruby, and the architecture/design of DBI/DBD is also new to me), it probably makes more sense when I see a concrete source code suggestion. Further it seems like you are trying to solve a much more general issue than my initial problem regarding DBD:Pg being US-specific. It all sounds good, but regarding my initial issue, I just want to emphasize that the underlying pg library accepts the ruby type DateTime, so in principle there is no need (for this specific situation and this specific DBD driver) to convert it into a string. Jarl From erik at hollensbe.org Mon Oct 6 11:34:25 2008 From: erik at hollensbe.org (Erik Hollensbe) Date: Mon, 6 Oct 2008 08:34:25 -0700 Subject: [ruby-dbi-users] DBD:Pg requires postgresql to be configured to datestyle=mdy (WAS: Proposing patch for sqlite3 driver (very, very long)) In-Reply-To: References: <200810012330.57787.erik@hollensbe.org> Message-ID: <200810060834.26055.erik@hollensbe.org> On Monday 06 October 2008 02:37:12 Jarl Friis wrote: > Dear Erik. > > Thanks for a very long explanation of what you have in your mind. It > will take some time for me to understand your ideas (I am new to ruby, > and the architecture/design of DBI/DBD is also new to me), it probably > makes more sense when I see a concrete source code suggestion. > > Further it seems like you are trying to solve a much more general > issue than my initial problem regarding DBD:Pg being US-specific. It > all sounds good, but regarding my initial issue, I just want to > emphasize that the underlying pg library accepts the ruby type > DateTime, so in principle there is no need (for this specific > situation and this specific DBD driver) to convert it into a > string. Yes, and if history is any indicator, next week it won't. :) I hate saying things like that on a mailing list, but sometimes they need to. We're not using the native driver conversion (anywhere) for a reason. -Erik From erik at hollensbe.org Wed Oct 8 11:18:09 2008 From: erik at hollensbe.org (Erik Hollensbe) Date: Wed, 8 Oct 2008 08:18:09 -0700 Subject: [ruby-dbi-users] Type management changes Message-ID: <200810080818.09822.erik@hollensbe.org> http://rubyforge.org/pipermail/ruby-dbi-users/2008-October/000054.html Any comments? It'd be nice if there was at least some discussion on this before I attempted to implement it. -Erik