Hey guys, So I am starting to build up an app for project of mine. I think the big hype lately is the NoSQL stuff that is going around like Cassandra, MongoDB, Redis, etc. What NoSQL databases have you guys been using and what for? I am thinking of using Mysql + Redis for a future project that I am working on. Where MySQL will contain most of the data and redis will be used for message queuing, caching, global variables, etc. I am still concerned with NoSQL security. But anyways what NoSQL implementations have you guys used for your flask apps? ~Jonathan C.
This came via a timely twitter post and is full of truth. "I did not write this. It sounds like me, but it was not me. Someone has spoken, who is listening?" -sc http://pastebin.com/raw.php?i=FD3xe6Jt > Don't use MongoDB > ================= > > I've kept quiet for awhile for various political reasons, but I now > feel a kind of social responsibility to deter people from banking > their business on MongoDB. > > Our team did serious load on MongoDB on a large (10s of millions > of users, high profile company) userbase, expecting, from early good > experiences, that the long-term scalability benefits touted by 10gen > would pan out. We were wrong, and this rant serves to deter you > from believing those benefits and making the same mistake > we did. If one person avoid the trap, it will have been > worth writing. Hopefully, many more do. > > Note that, in our experiences with 10gen, they were nearly always > helpful and cordial, and often extremely so. But at the same > time, that cannot be reason alone to supress information about > the failings of their product. > > Why this matters > ---------------- > > Databases must be right, or as-right-as-possible, b/c database > mistakes are so much more severe than almost every other variation > of mistake. Not only does it have the largest impact on uptime, > performance, expense, and value (the inherit value of the data), > but data has *inertia*. Migrating TBs of data on-the-fly is > a massive undertaking compared to changing drcses or fixing the > average logic error in your code. Recovering TBs of data while > down, limited by what spindles can do for you, is a helpless > feeling. > > Databases are also complex systems that are effectively black > boxes to the end developer. By adopting a database system, > you place absolute trust in their ability to do the right thing > with your data to keep it consistent and available. > > Why is MongoDB popular? > ----------------------- > > To be fair, it must be acknowledged that MongoDB is popular, > and that there are valid reasons for its popularity. > > * It is remarkably easy to get running > * Schema-free models that map to JSON-like structures > have great appeal to developers (they fit our brains), > and a developer is almost always the individual who > makes the platform decisions when a project is in > its infancy > * Maturity and robustness, track record, tested real-world > use cases, etc, are typically more important to sysadmin > types or operations specialists, who often inherit the > platform long after the initial decisions are made > * Its single-system, low concurrency read performance benchmarks > are impressive, and for the inexperienced evaluator, this > is often The Most Important Thing > > Now, if you're writing a toy site, or a prototype, something > where developer productivity trumps all other considerations, > it basically doesn't matter *what* you use. Use whatever > gets the job done. > > But if you're intending to really run a large scale system > on Mongo, one that a business might depend on, simply put: > > Don't. > > Why not? > -------- > > **1. MongoDB issues writes in unsafe ways *by default* in order to > win benchmarks** > > If you don't issue getLastError(), MongoDB doesn't wait for any > confirmation from the database that the command was processed. > This introduces at least two classes of problems: > > * In a concurrent environment (connection pools, etc), you may > have a subsequent read fail after a write has "finished"; > there is no barrier condition to know at what point the > database will recognize a write commitment > * Any unknown number of save operations can be dropped on the floor > due to queueing in various places, things outstanding in the TCP > buffer, etc, when your connection drops of the db were to be KILL'd or > segfault, hardware crash, you name it > > **2. MongoDB can lose data in many startling ways** > > Here is a list of ways we personally experienced records go missing: > > 1. They just disappeared sometimes. Cause unknown. > 2. Recovery on corrupt database was not successful, > pre transaction log. > 3. Replication between master and slave had *gaps* in the oplogs, > causing slaves to be missing records the master had. Yes, > there is no checksum, and yes, the replication status had the > slaves current > 4. Replication just stops sometimes, without error. Monitor > your replication status! > > **3. MongoDB requires a global write lock to issue any write** > > Under a write-heavy load, this will kill you. If you run a blog, > you maybe don't care b/c your R:W ratio is so high. > > **4. MongoDB's sharding doesn't work that well under load** > > Adding a shard under heavy load is a nightmare. > Mongo either moves chunks between shards so quickly it DOSes > the production traffic, or refuses to more chunks altogether. > > This pretty much makes it a non-starter for high-traffic > sites with heavy write volume. > > **5. mongos is unreliable** > > The mongod/config server/mongos architecture is actually pretty > reasonable and clever. Unfortunately, mongos is complete > garbage. Under load, it crashed anywhere from every few hours > to every few days. Restart supervision didn't always help b/c > sometimes it would throw some assertion that would bail out a > critical thread, but the process would stay running. Double > fail. > > It got so bad the only usable way we found to run mongos was > to run haproxy in front of dozens of mongos instances, and > to have a job that slowly rotated through them and killed them > to keep fresh/live ones in the pool. No joke. > > **6. MongoDB actually once deleted the entire dataset** > > MongoDB, 1.6, in replica set configuration, would sometimes > determine the wrong node (often an empty node) was the freshest > copy of the data available. It would then DELETE ALL THE DATA > ON THE REPLICA (which may have been the 700GB of good data) > AND REPLICATE THE EMPTY SET. The database should never never > never do this. Faced with a situation like that, the database > should throw an error and make the admin disambiguate by > wiping/resetting data, or forcing the correct configuration. > NEVER DELETE ALL THE DATA. (This was a bad day.) > > They fixed this in 1.8, thank god. > > **7. Things were shipped that should have never been shipped** > > Things with known, embarrassing bugs that could cause data > problems were in "stable" releases--and often we weren't told > about these issues until after they bit us, and then only b/c > we had a super duper crazy platinum support contract with 10gen. > > The response was to send up a hot patch and that they were > calling an RC internally, and then run that on our data. > > **8. Replication was lackluster on busy servers** > > Replication would often, again, either DOS the master, or > replicate so slowly that it would take far too long and > the oplog would be exhausted (even with a 50G oplog). > > We had a busy, large dataset that we simply could > not replicate b/c of this dynamic. It was a harrowing month > or two of finger crossing before we got it onto a different > database system. > > **But, the real problem:** > > You might object, my information is out of date; they've > fixed these problems or intend to fix them in the next version; > problem X can be mitigated by optional practice Y. > > Unfortunately, it doesn't matter. > > The real problem is that so many of these problems existed > in the first place. > > Database developers must be held to a higher standard than > your average developer. Namely, your priority list should > typically be something like: > > 1. Don't lose data, be very deterministic with data > 2. Employ practices to stay available > 3. Multi-node scalability > 4. Minimize latency at 99% and 95% > 5. Raw req/s per resource > > 10gen's order seems to be, #5, then everything else in some > order. #1 ain't in the top 3. > > These failings, and the implied priorities of the company, > indicate a basic cultural problem, irrespective of whatever > problems exist in any single release: a lack of the requisite > discipline to design database systems businesses should bet on. > > Please take this warning seriously. And why I don't recommend MySQL either. You could've s/mongodb/MySQL/goi and the post would have been almost accurate as well. On Nov 1, 2011, at 3:44 PM, Jonathan Chen wrote: > Hey guys, > > So I am starting to build up an app for project of mine. I think the big hype lately is the NoSQL stuff that is going around like Cassandra, MongoDB, Redis, etc. What NoSQL databases have you guys been using and what for? I am thinking of using Mysql + Redis for a future project that I am working on. Where MySQL will contain most of the data and redis will be used for message queuing, caching, global variables, etc. I am still concerned with NoSQL security. But anyways what NoSQL implementations have you guys used for your flask apps? > > ~Jonathan C. -- Sean Chittenden sean@chittenden.org
MongoDB's CTO has typed a rebuttal to the that post on pastebin http://news.ycombinator.com/item?id=3202959 On Sun, Nov 6, 2011 at 9:41 AM, Sean Chittenden <sean@chittenden.org> wrote: > This came via a timely twitter post and is full of truth. "I did not write > this. It sounds like me, but it was not me. Someone has spoken, who is > listening?" > > -sc > > http://pastebin.com/raw.php?i=FD3xe6Jt > > > Don't use MongoDB > ================= > > I've kept quiet for awhile for various political reasons, but I now > feel a kind of social responsibility to deter people from banking > their business on MongoDB. > > Our team did serious load on MongoDB on a large (10s of millions > of users, high profile company) userbase, expecting, from early good > experiences, that the long-term scalability benefits touted by 10gen > would pan out. We were wrong, and this rant serves to deter you > from believing those benefits and making the same mistake > we did. If one person avoid the trap, it will have been > worth writing. Hopefully, many more do. > > Note that, in our experiences with 10gen, they were nearly always > helpful and cordial, and often extremely so. But at the same > time, that cannot be reason alone to supress information about > the failings of their product. > > Why this matters > ---------------- > > Databases must be right, or as-right-as-possible, b/c database > mistakes are so much more severe than almost every other variation > of mistake. Not only does it have the largest impact on uptime, > performance, expense, and value (the inherit value of the data), > but data has *inertia*. Migrating TBs of data on-the-fly is > a massive undertaking compared to changing drcses or fixing the > average logic error in your code. Recovering TBs of data while > down, limited by what spindles can do for you, is a helpless > feeling. > > Databases are also complex systems that are effectively black > boxes to the end developer. By adopting a database system, > you place absolute trust in their ability to do the right thing > with your data to keep it consistent and available. > > Why is MongoDB popular? > ----------------------- > > To be fair, it must be acknowledged that MongoDB is popular, > and that there are valid reasons for its popularity. > > * It is remarkably easy to get running > * Schema-free models that map to JSON-like structures > have great appeal to developers (they fit our brains), > and a developer is almost always the individual who > makes the platform decisions when a project is in > its infancy > * Maturity and robustness, track record, tested real-world > use cases, etc, are typically more important to sysadmin > types or operations specialists, who often inherit the > platform long after the initial decisions are made > * Its single-system, low concurrency read performance benchmarks > are impressive, and for the inexperienced evaluator, this > is often The Most Important Thing > > Now, if you're writing a toy site, or a prototype, something > where developer productivity trumps all other considerations, > it basically doesn't matter *what* you use. Use whatever > gets the job done. > > But if you're intending to really run a large scale system > on Mongo, one that a business might depend on, simply put: > > Don't. > > Why not? > -------- > > **1. MongoDB issues writes in unsafe ways *by default* in order to > win benchmarks** > > If you don't issue getLastError(), MongoDB doesn't wait for any > confirmation from the database that the command was processed. > This introduces at least two classes of problems: > > * In a concurrent environment (connection pools, etc), you may > have a subsequent read fail after a write has "finished"; > there is no barrier condition to know at what point the > database will recognize a write commitment > * Any unknown number of save operations can be dropped on the floor > due to queueing in various places, things outstanding in the TCP > buffer, etc, when your connection drops of the db were to be KILL'd or > segfault, hardware crash, you name it > > **2. MongoDB can lose data in many startling ways** > > Here is a list of ways we personally experienced records go missing: > > 1. They just disappeared sometimes. Cause unknown. > 2. Recovery on corrupt database was not successful, > pre transaction log. > 3. Replication between master and slave had *gaps* in the oplogs, > causing slaves to be missing records the master had. Yes, > there is no checksum, and yes, the replication status had the > slaves current > 4. Replication just stops sometimes, without error. Monitor > your replication status! > > **3. MongoDB requires a global write lock to issue any write** > > Under a write-heavy load, this will kill you. If you run a blog, > you maybe don't care b/c your R:W ratio is so high. > > **4. MongoDB's sharding doesn't work that well under load** > > Adding a shard under heavy load is a nightmare. > Mongo either moves chunks between shards so quickly it DOSes > the production traffic, or refuses to more chunks altogether. > > This pretty much makes it a non-starter for high-traffic > sites with heavy write volume. > > **5. mongos is unreliable** > > The mongod/config server/mongos architecture is actually pretty > reasonable and clever. Unfortunately, mongos is complete > garbage. Under load, it crashed anywhere from every few hours > to every few days. Restart supervision didn't always help b/c > sometimes it would throw some assertion that would bail out a > critical thread, but the process would stay running. Double > fail. > > It got so bad the only usable way we found to run mongos was > to run haproxy in front of dozens of mongos instances, and > to have a job that slowly rotated through them and killed them > to keep fresh/live ones in the pool. No joke. > > **6. MongoDB actually once deleted the entire dataset** > > MongoDB, 1.6, in replica set configuration, would sometimes > determine the wrong node (often an empty node) was the freshest > copy of the data available. It would then DELETE ALL THE DATA > ON THE REPLICA (which may have been the 700GB of good data) > AND REPLICATE THE EMPTY SET. The database should never never > never do this. Faced with a situation like that, the database > should throw an error and make the admin disambiguate by > wiping/resetting data, or forcing the correct configuration. > NEVER DELETE ALL THE DATA. (This was a bad day.) > > They fixed this in 1.8, thank god. > > **7. Things were shipped that should have never been shipped** > > Things with known, embarrassing bugs that could cause data > problems were in "stable" releases--and often we weren't told > about these issues until after they bit us, and then only b/c > we had a super duper crazy platinum support contract with 10gen. > > The response was to send up a hot patch and that they were > calling an RC internally, and then run that on our data. > > **8. Replication was lackluster on busy servers** > > Replication would often, again, either DOS the master, or > replicate so slowly that it would take far too long and > the oplog would be exhausted (even with a 50G oplog). > > We had a busy, large dataset that we simply could > not replicate b/c of this dynamic. It was a harrowing month > or two of finger crossing before we got it onto a different > database system. > > **But, the real problem:** > > You might object, my information is out of date; they've > fixed these problems or intend to fix them in the next version; > problem X can be mitigated by optional practice Y. > > Unfortunately, it doesn't matter. > > The real problem is that so many of these problems existed > in the first place. > > Database developers must be held to a higher standard than > your average developer. Namely, your priority list should > typically be something like: > > 1. Don't lose data, be very deterministic with data > 2. Employ practices to stay available > 3. Multi-node scalability > 4. Minimize latency at 99% and 95% > 5. Raw req/s per resource > > 10gen's order seems to be, #5, then everything else in some > order. #1 ain't in the top 3. > > These failings, and the implied priorities of the company, > indicate a basic cultural problem, irrespective of whatever > problems exist in any single release: a lack of the requisite > discipline to design database systems businesses should bet on. > > Please take this warning seriously. > > > And why I don't recommend MySQL either. You could've s/mongodb/MySQL/goi > and the post would have been almost accurate as well. > > > > On Nov 1, 2011, at 3:44 PM, Jonathan Chen wrote: > > Hey guys, > > So I am starting to build up an app for project of mine. I think the big > hype lately is the NoSQL stuff that is going around like Cassandra, > MongoDB, Redis, etc. What NoSQL databases have you guys been using and what > for? I am thinking of using Mysql + Redis for a future project that I am > working on. Where MySQL will contain most of the data and redis will be > used for message queuing, caching, global variables, etc. I am still > concerned with NoSQL security. But anyways what NoSQL implementations have > you guys used for your flask apps? > > ~Jonathan C. > > > > > > -- > Sean Chittenden > sean@chittenden.org > >
Re: original pastie... I don't care much for idealistic rants for or against any tech. It all seemed too cranked up to 11 and overall, a lot of HN crowd chimed in with somewhat cooler heads. What I dont get, is that the original pastebin entry got so much play even though it came way out of left field and completely anon. I guess some people like to see the world burn . Back to my projects i guess... with mongo no-less On Nov 6, 2011 4:47 PM, "Cheng-Han Lee" <lee.chenghan@gmail.com> wrote: > > MongoDB's CTO has typed a rebuttal to the that post on pastebin > > http://news.ycombinator.com/item?id=3202959 > > > On Sun, Nov 6, 2011 at 9:41 AM, Sean Chittenden <sean@chittenden.org> wrote: >> >> This came via a timely twitter post and is full of truth. "I did not write this. It sounds like me, but it was not me. Someone has spoken, who is listening?" >> >> -sc >> >> http://pastebin.com/raw.php?i=FD3xe6Jt >> >> >>> Don't use MongoDB >>> ================= >>> >>> I've kept quiet for awhile for various political reasons, but I now >>> feel a kind of social responsibility to deter people from banking >>> their business on MongoDB. >>> >>> Our team did serious load on MongoDB on a large (10s of millions >>> of users, high profile company) userbase, expecting, from early good >>> experiences, that the long-term scalability benefits touted by 10gen >>> would pan out. We were wrong, and this rant serves to deter you >>> from believing those benefits and making the same mistake >>> we did. If one person avoid the trap, it will have been >>> worth writing. Hopefully, many more do. >>> >>> Note that, in our experiences with 10gen, they were nearly always >>> helpful and cordial, and often extremely so. But at the same >>> time, that cannot be reason alone to supress information about >>> the failings of their product. >>> >>> Why this matters >>> ---------------- >>> >>> Databases must be right, or as-right-as-possible, b/c database >>> mistakes are so much more severe than almost every other variation >>> of mistake. Not only does it have the largest impact on uptime, >>> performance, expense, and value (the inherit value of the data), >>> but data has *inertia*. Migrating TBs of data on-the-fly is >>> a massive undertaking compared to changing drcses or fixing the >>> average logic error in your code. Recovering TBs of data while >>> down, limited by what spindles can do for you, is a helpless >>> feeling. >>> >>> Databases are also complex systems that are effectively black >>> boxes to the end developer. By adopting a database system, >>> you place absolute trust in their ability to do the right thing >>> with your data to keep it consistent and available. >>> >>> Why is MongoDB popular? >>> ----------------------- >>> >>> To be fair, it must be acknowledged that MongoDB is popular, >>> and that there are valid reasons for its popularity. >>> >>> * It is remarkably easy to get running >>> * Schema-free models that map to JSON-like structures >>> have great appeal to developers (they fit our brains), >>> and a developer is almost always the individual who >>> makes the platform decisions when a project is in >>> its infancy >>> * Maturity and robustness, track record, tested real-world >>> use cases, etc, are typically more important to sysadmin >>> types or operations specialists, who often inherit the >>> platform long after the initial decisions are made >>> * Its single-system, low concurrency read performance benchmarks >>> are impressive, and for the inexperienced evaluator, this >>> is often The Most Important Thing >>> >>> Now, if you're writing a toy site, or a prototype, something >>> where developer productivity trumps all other considerations, >>> it basically doesn't matter *what* you use. Use whatever >>> gets the job done. >>> >>> But if you're intending to really run a large scale system >>> on Mongo, one that a business might depend on, simply put: >>> >>> Don't. >>> >>> Why not? >>> -------- >>> >>> **1. MongoDB issues writes in unsafe ways *by default* in order to >>> win benchmarks** >>> >>> If you don't issue getLastError(), MongoDB doesn't wait for any >>> confirmation from the database that the command was processed. >>> This introduces at least two classes of problems: >>> >>> * In a concurrent environment (connection pools, etc), you may >>> have a subsequent read fail after a write has "finished"; >>> there is no barrier condition to know at what point the >>> database will recognize a write commitment >>> * Any unknown number of save operations can be dropped on the floor >>> due to queueing in various places, things outstanding in the TCP >>> buffer, etc, when your connection drops of the db were to be KILL'd or >>> segfault, hardware crash, you name it >>> >>> **2. MongoDB can lose data in many startling ways** >>> >>> Here is a list of ways we personally experienced records go missing: >>> >>> 1. They just disappeared sometimes. Cause unknown. >>> 2. Recovery on corrupt database was not successful, >>> pre transaction log. >>> 3. Replication between master and slave had *gaps* in the oplogs, >>> causing slaves to be missing records the master had. Yes, >>> there is no checksum, and yes, the replication status had the >>> slaves current >>> 4. Replication just stops sometimes, without error. Monitor >>> your replication status! >>> >>> **3. MongoDB requires a global write lock to issue any write** >>> >>> Under a write-heavy load, this will kill you. If you run a blog, >>> you maybe don't care b/c your R:W ratio is so high. >>> >>> **4. MongoDB's sharding doesn't work that well under load** >>> >>> Adding a shard under heavy load is a nightmare. >>> Mongo either moves chunks between shards so quickly it DOSes >>> the production traffic, or refuses to more chunks altogether. >>> >>> This pretty much makes it a non-starter for high-traffic >>> sites with heavy write volume. >>> >>> **5. mongos is unreliable** >>> >>> The mongod/config server/mongos architecture is actually pretty >>> reasonable and clever. Unfortunately, mongos is complete >>> garbage. Under load, it crashed anywhere from every few hours >>> to every few days. Restart supervision didn't always help b/c >>> sometimes it would throw some assertion that would bail out a >>> critical thread, but the process would stay running. Double >>> fail. >>> >>> It got so bad the only usable way we found to run mongos was >>> to run haproxy in front of dozens of mongos instances, and >>> to have a job that slowly rotated through them and killed them >>> to keep fresh/live ones in the pool. No joke. >>> >>> **6. MongoDB actually once deleted the entire dataset** >>> >>> MongoDB, 1.6, in replica set configuration, would sometimes >>> determine the wrong node (often an empty node) was the freshest >>> copy of the data available. It would then DELETE ALL THE DATA >>> ON THE REPLICA (which may have been the 700GB of good data) >>> AND REPLICATE THE EMPTY SET. The database should never never >>> never do this. Faced with a situation like that, the database >>> should throw an error and make the admin disambiguate by >>> wiping/resetting data, or forcing the correct configuration. >>> NEVER DELETE ALL THE DATA. (This was a bad day.) >>> >>> They fixed this in 1.8, thank god. >>> >>> **7. Things were shipped that should have never been shipped** >>> >>> Things with known, embarrassing bugs that could cause data >>> problems were in "stable" releases--and often we weren't told >>> about these issues until after they bit us, and then only b/c >>> we had a super duper crazy platinum support contract with 10gen. >>> >>> The response was to send up a hot patch and that they were >>> calling an RC internally, and then run that on our data. >>> >>> **8. Replication was lackluster on busy servers** >>> >>> Replication would often, again, either DOS the master, or >>> replicate so slowly that it would take far too long and >>> the oplog would be exhausted (even with a 50G oplog). >>> >>> We had a busy, large dataset that we simply could >>> not replicate b/c of this dynamic. It was a harrowing month >>> or two of finger crossing before we got it onto a different >>> database system. >>> >>> **But, the real problem:** >>> >>> You might object, my information is out of date; they've >>> fixed these problems or intend to fix them in the next version; >>> problem X can be mitigated by optional practice Y. >>> >>> Unfortunately, it doesn't matter. >>> >>> The real problem is that so many of these problems existed >>> in the first place. >>> >>> Database developers must be held to a higher standard than >>> your average developer. Namely, your priority list should >>> typically be something like: >>> >>> 1. Don't lose data, be very deterministic with data >>> 2. Employ practices to stay available >>> 3. Multi-node scalability >>> 4. Minimize latency at 99% and 95% >>> 5. Raw req/s p
Hi, On 2011-11-07 2:21 AM, Arek Bochinski wrote: > What I dont get, is that the original pastebin entry got so much play > even though it came way out of left field and completely anon. Well. MongoDB's general problems were not exactly unknown. I just wonder why it came up now since I think many of them were resolved. Regards, Armin
Don't mean to push this thread more OT, but Sean, which DB would you recomend for large scale then? Preferably open source. Thanks, Ford, ford.anthonyj@gmail.com On Nov 6, 2011 11:42 AM, "Sean Chittenden" <sean@chittenden.org> wrote: > This came via a timely twitter post and is full of truth. "I did not write > this. It sounds like me, but it was not me. Someone has spoken, who is > listening?" > > -sc > > http://pastebin.com/raw.php?i=FD3xe6Jt > > > Don't use MongoDB > ================= > > I've kept quiet for awhile for various political reasons, but I now > feel a kind of social responsibility to deter people from banking > their business on MongoDB. > > Our team did serious load on MongoDB on a large (10s of millions > of users, high profile company) userbase, expecting, from early good > experiences, that the long-term scalability benefits touted by 10gen > would pan out. We were wrong, and this rant serves to deter you > from believing those benefits and making the same mistake > we did. If one person avoid the trap, it will have been > worth writing. Hopefully, many more do. > > Note that, in our experiences with 10gen, they were nearly always > helpful and cordial, and often extremely so. But at the same > time, that cannot be reason alone to supress information about > the failings of their product. > > Why this matters > ---------------- > > Databases must be right, or as-right-as-possible, b/c database > mistakes are so much more severe than almost every other variation > of mistake. Not only does it have the largest impact on uptime, > performance, expense, and value (the inherit value of the data), > but data has *inertia*. Migrating TBs of data on-the-fly is > a massive undertaking compared to changing drcses or fixing the > average logic error in your code. Recovering TBs of data while > down, limited by what spindles can do for you, is a helpless > feeling. > > Databases are also complex systems that are effectively black > boxes to the end developer. By adopting a database system, > you place absolute trust in their ability to do the right thing > with your data to keep it consistent and available. > > Why is MongoDB popular? > ----------------------- > > To be fair, it must be acknowledged that MongoDB is popular, > and that there are valid reasons for its popularity. > > * It is remarkably easy to get running > * Schema-free models that map to JSON-like structures > have great appeal to developers (they fit our brains), > and a developer is almost always the individual who > makes the platform decisions when a project is in > its infancy > * Maturity and robustness, track record, tested real-world > use cases, etc, are typically more important to sysadmin > types or operations specialists, who often inherit the > platform long after the initial decisions are made > * Its single-system, low concurrency read performance benchmarks > are impressive, and for the inexperienced evaluator, this > is often The Most Important Thing > > Now, if you're writing a toy site, or a prototype, something > where developer productivity trumps all other considerations, > it basically doesn't matter *what* you use. Use whatever > gets the job done. > > But if you're intending to really run a large scale system > on Mongo, one that a business might depend on, simply put: > > Don't. > > Why not? > -------- > > **1. MongoDB issues writes in unsafe ways *by default* in order to > win benchmarks** > > If you don't issue getLastError(), MongoDB doesn't wait for any > confirmation from the database that the command was processed. > This introduces at least two classes of problems: > > * In a concurrent environment (connection pools, etc), you may > have a subsequent read fail after a write has "finished"; > there is no barrier condition to know at what point the > database will recognize a write commitment > * Any unknown number of save operations can be dropped on the floor > due to queueing in various places, things outstanding in the TCP > buffer, etc, when your connection drops of the db were to be KILL'd or > segfault, hardware crash, you name it > > **2. MongoDB can lose data in many startling ways** > > Here is a list of ways we personally experienced records go missing: > > 1. They just disappeared sometimes. Cause unknown. > 2. Recovery on corrupt database was not successful, > pre transaction log. > 3. Replication between master and slave had *gaps* in the oplogs, > causing slaves to be missing records the master had. Yes, > there is no checksum, and yes, the replication status had the > slaves current > 4. Replication just stops sometimes, without error. Monitor > your replication status! > > **3. MongoDB requires a global write lock to issue any write** > > Under a write-heavy load, this will kill you. If you run a blog, > you maybe don't care b/c your R:W ratio is so high. > > **4. MongoDB's sharding doesn't work that well under load** > > Adding a shard under heavy load is a nightmare. > Mongo either moves chunks between shards so quickly it DOSes > the production traffic, or refuses to more chunks altogether. > > This pretty much makes it a non-starter for high-traffic > sites with heavy write volume. > > **5. mongos is unreliable** > > The mongod/config server/mongos architecture is actually pretty > reasonable and clever. Unfortunately, mongos is complete > garbage. Under load, it crashed anywhere from every few hours > to every few days. Restart supervision didn't always help b/c > sometimes it would throw some assertion that would bail out a > critical thread, but the process would stay running. Double > fail. > > It got so bad the only usable way we found to run mongos was > to run haproxy in front of dozens of mongos instances, and > to have a job that slowly rotated through them and killed them > to keep fresh/live ones in the pool. No joke. > > **6. MongoDB actually once deleted the entire dataset** > > MongoDB, 1.6, in replica set configuration, would sometimes > determine the wrong node (often an empty node) was the freshest > copy of the data available. It would then DELETE ALL THE DATA > ON THE REPLICA (which may have been the 700GB of good data) > AND REPLICATE THE EMPTY SET. The database should never never > never do this. Faced with a situation like that, the database > should throw an error and make the admin disambiguate by > wiping/resetting data, or forcing the correct configuration. > NEVER DELETE ALL THE DATA. (This was a bad day.) > > They fixed this in 1.8, thank god. > > **7. Things were shipped that should have never been shipped** > > Things with known, embarrassing bugs that could cause data > problems were in "stable" releases--and often we weren't told > about these issues until after they bit us, and then only b/c > we had a super duper crazy platinum support contract with 10gen. > > The response was to send up a hot patch and that they were > calling an RC internally, and then run that on our data. > > **8. Replication was lackluster on busy servers** > > Replication would often, again, either DOS the master, or > replicate so slowly that it would take far too long and > the oplog would be exhausted (even with a 50G oplog). > > We had a busy, large dataset that we simply could > not replicate b/c of this dynamic. It was a harrowing month > or two of finger crossing before we got it onto a different > database system. > > **But, the real problem:** > > You might object, my information is out of date; they've > fixed these problems or intend to fix them in the next version; > problem X can be mitigated by optional practice Y. > > Unfortunately, it doesn't matter. > > The real problem is that so many of these problems existed > in the first place. > > Database developers must be held to a higher standard than > your average developer. Namely, your priority list should > typically be something like: > > 1. Don't lose data, be very deterministic with data > 2. Employ practices to stay available > 3. Multi-node scalability > 4. Minimize latency at 99% and 95% > 5. Raw req/s per resource > > 10gen's order seems to be, #5, then everything else in some > order. #1 ain't in the top 3. > > These failings, and the implied priorities of the company, > indicate a basic cultural problem, irrespective of whatever > problems exist in any single release: a lack of the requisite > discipline to design database systems businesses should bet on. > > Please take this warning seriously. > > > And why I don't recommend MySQL either. You could've s/mongodb/MySQL/goi > and the post would have been almost accurate as well. > > > > On Nov 1, 2011, at 3:44 PM, Jonathan Chen wrote: > > Hey guys, > > So I am starting to build up an app for project of mine. I think the big > hype lately is the NoSQL stuff that is going around like Cassandra, > MongoDB, Redis, etc. What NoSQL databases have you guys been using and what > for? I am thinking of using Mysql + Redis for a future project that I am > working on. Where MySQL will contain most of the data and redis will be > used for message queuing, caching, global variables, etc. I am still > concerned with NoSQL security. But anyways what NoSQL implementations have > you guys used for your flask apps? > > ~Jonathan C. > > > > > > -- > Sean Chittenden > sean@chittenden.org > >
For databases, PostgreSQL. -sc -- Sean Chittenden On Nov 6, 2011, at 10:11, Anthony Ford <ford.anthonyj@gmail.com> wrote: > Don't mean to push this thread more OT, but Sean, which DB would you recomend for large scale then? Preferably open source. > > Thanks, > Ford, > ford.anthonyj@gmail.com > > On Nov 6, 2011 11:42 AM, "Sean Chittenden" <sean@chittenden.org> wrote: > This came via a timely twitter post and is full of truth. "I did not write this. It sounds like me, but it was not me. Someone has spoken, who is listening?" > > -sc > > http://pastebin.com/raw.php?i=FD3xe6Jt > > >> Don't use MongoDB >> ================= >> >> I've kept quiet for awhile for various political reasons, but I now >> feel a kind of social responsibility to deter people from banking >> their business on MongoDB. >> >> Our team did serious load on MongoDB on a large (10s of millions >> of users, high profile company) userbase, expecting, from early good >> experiences, that the long-term scalability benefits touted by 10gen >> would pan out. We were wrong, and this rant serves to deter you >> from believing those benefits and making the same mistake >> we did. If one person avoid the trap, it will have been >> worth writing. Hopefully, many more do. >> >> Note that, in our experiences with 10gen, they were nearly always >> helpful and cordial, and often extremely so. But at the same >> time, that cannot be reason alone to supress information about >> the failings of their product. >> >> Why this matters >> ---------------- >> >> Databases must be right, or as-right-as-possible, b/c database >> mistakes are so much more severe than almost every other variation >> of mistake. Not only does it have the largest impact on uptime, >> performance, expense, and value (the inherit value of the data), >> but data has *inertia*. Migrating TBs of data on-the-fly is >> a massive undertaking compared to changing drcses or fixing the >> average logic error in your code. Recovering TBs of data while >> down, limited by what spindles can do for you, is a helpless >> feeling. >> >> Databases are also complex systems that are effectively black >> boxes to the end developer. By adopting a database system, >> you place absolute trust in their ability to do the right thing >> with your data to keep it consistent and available. >> >> Why is MongoDB popular? >> ----------------------- >> >> To be fair, it must be acknowledged that MongoDB is popular, >> and that there are valid reasons for its popularity. >> >> * It is remarkably easy to get running >> * Schema-free models that map to JSON-like structures >> have great appeal to developers (they fit our brains), >> and a developer is almost always the individual who >> makes the platform decisions when a project is in >> its infancy >> * Maturity and robustness, track record, tested real-world >> use cases, etc, are typically more important to sysadmin >> types or operations specialists, who often inherit the >> platform long after the initial decisions are made >> * Its single-system, low concurrency read performance benchmarks >> are impressive, and for the inexperienced evaluator, this >> is often The Most Important Thing >> >> Now, if you're writing a toy site, or a prototype, something >> where developer productivity trumps all other considerations, >> it basically doesn't matter *what* you use. Use whatever >> gets the job done. >> >> But if you're intending to really run a large scale system >> on Mongo, one that a business might depend on, simply put: >> >> Don't. >> >> Why not? >> -------- >> >> **1. MongoDB issues writes in unsafe ways *by default* in order to >> win benchmarks** >> >> If you don't issue getLastError(), MongoDB doesn't wait for any >> confirmation from the database that the command was processed. >> This introduces at least two classes of problems: >> >> * In a concurrent environment (connection pools, etc), you may >> have a subsequent read fail after a write has "finished"; >> there is no barrier condition to know at what point the >> database will recognize a write commitment >> * Any unknown number of save operations can be dropped on the floor >> due to queueing in various places, things outstanding in the TCP >> buffer, etc, when your connection drops of the db were to be KILL'd or >> segfault, hardware crash, you name it >> >> **2. MongoDB can lose data in many startling ways** >> >> Here is a list of ways we personally experienced records go missing: >> >> 1. They just disappeared sometimes. Cause unknown. >> 2. Recovery on corrupt database was not successful, >> pre transaction log. >> 3. Replication between master and slave had *gaps* in the oplogs, >> causing slaves to be missing records the master had. Yes, >> there is no checksum, and yes, the replication status had the >> slaves current >> 4. Replication just stops sometimes, without error. Monitor >> your replication status! >> >> **3. MongoDB requires a global write lock to issue any write** >> >> Under a write-heavy load, this will kill you. If you run a blog, >> you maybe don't care b/c your R:W ratio is so high. >> >> **4. MongoDB's sharding doesn't work that well under load** >> >> Adding a shard under heavy load is a nightmare. >> Mongo either moves chunks between shards so quickly it DOSes >> the production traffic, or refuses to more chunks altogether. >> >> This pretty much makes it a non-starter for high-traffic >> sites with heavy write volume. >> >> **5. mongos is unreliable** >> >> The mongod/config server/mongos architecture is actually pretty >> reasonable and clever. Unfortunately, mongos is complete >> garbage. Under load, it crashed anywhere from every few hours >> to every few days. Restart supervision didn't always help b/c >> sometimes it would throw some assertion that would bail out a >> critical thread, but the process would stay running. Double >> fail. >> >> It got so bad the only usable way we found to run mongos was >> to run haproxy in front of dozens of mongos instances, and >> to have a job that slowly rotated through them and killed them >> to keep fresh/live ones in the pool. No joke. >> >> **6. MongoDB actually once deleted the entire dataset** >> >> MongoDB, 1.6, in replica set configuration, would sometimes >> determine the wrong node (often an empty node) was the freshest >> copy of the data available. It would then DELETE ALL THE DATA >> ON THE REPLICA (which may have been the 700GB of good data) >> AND REPLICATE THE EMPTY SET. The database should never never >> never do this. Faced with a situation like that, the database >> should throw an error and make the admin disambiguate by >> wiping/resetting data, or forcing the correct configuration. >> NEVER DELETE ALL THE DATA. (This was a bad day.) >> >> They fixed this in 1.8, thank god. >> >> **7. Things were shipped that should have never been shipped** >> >> Things with known, embarrassing bugs that could cause data >> problems were in "stable" releases--and often we weren't told >> about these issues until after they bit us, and then only b/c >> we had a super duper crazy platinum support contract with 10gen. >> >> The response was to send up a hot patch and that they were >> calling an RC internally, and then run that on our data. >> >> **8. Replication was lackluster on busy servers** >> >> Replication would often, again, either DOS the master, or >> replicate so slowly that it would take far too long and >> the oplog would be exhausted (even with a 50G oplog). >> >> We had a busy, large dataset that we simply could >> not replicate b/c of this dynamic. It was a harrowing month >> or two of finger crossing before we got it onto a different >> database system. >> >> **But, the real problem:** >> >> You might object, my information is out of date; they've >> fixed these problems or intend to fix them in the next version; >> problem X can be mitigated by optional practice Y. >> >> Unfortunately, it doesn't matter. >> >> The real problem is that so many of these problems existed >> in the first place. >> >> Database developers must be held to a higher standard than >> your average developer. Namely, your priority list should >> typically be something like: >> >> 1. Don't lose data, be very deterministic with data >> 2. Employ practices to stay available >> 3. Multi-node scalability >> 4. Minimize latency at 99% and 95% >> 5. Raw req/s per resource >> >> 10gen's order seems to be, #5, then everything else in some >> order. #1 ain't in the top 3. >> >> These failings, and the implied priorities of the company, >> indicate a basic cultural problem, irrespective of whatever >> problems exist in any single release: a lack of the requisite >> discipline to design database systems businesses should bet on. >> >> Please take this warning seriously. > > And why I don't recommend MySQL either. You could've s/mongodb/MySQL/goi and the post would have been almost accurate as well. > > > > On Nov 1, 2011, at 3:44 PM, Jonathan Chen wrote: > >> Hey guys, >> >> So I am starting to build up an app for project of mine. I think the big hype lately is the NoSQL stuff that is going around like Cassandra, MongoDB, Redis, etc. What NoSQL databases have you guys been using and what for? I am thinking of using Mysql + Redis for a future project that I am working on. Where MySQL will contain most of the data and redis will be used for message queuing, caching, global variables, etc. I am still concerned with NoSQL security. But anyways what NoSQL implementations have you guys used for your flask apps? >> >> ~Jonathan C. > > > > > -- > Sean Chittenden > sean@chittenden.org >
SQL ftw On 1 November 2011 22:44, Jonathan Chen <tamasiaina@gmail.com> wrote: > Hey guys, > > So I am starting to build up an app for project of mine. I think the big > hype lately is the NoSQL stuff that is going around like Cassandra, > MongoDB, Redis, etc. What NoSQL databases have you guys been using and what > for? I am thinking of using Mysql + Redis for a future project that I am > working on. Where MySQL will contain most of the data and redis will be > used for message queuing, caching, global variables, etc. I am still > concerned with NoSQL security. But anyways what NoSQL implementations have > you guys used for your flask apps? > > ~Jonathan C. >
Hi, Use Postgres + Redis :-) Postgres because it's stable and fast and generally your data is relational. Except when it's not. When it's not Redis is the tool for the job. Regards, Armin
Redis is great and redis author, antirez, has just started a site name lamernews.com as an example showcase. It's source is available albeit in Ruby. For my own needs, I go with MySQL since I know it well and can get around the shortcomings. Last two projects, including current one, were paired with mongo as well. I looked at the geographical query ease of use and capability and found it to be very balanced. There is a thing about Mongo that you need to know, and that if you think your dataset gets above 2GB, you will want it to be on a 64-bit system, which is a non-issue most of the time. On Thu, Nov 3, 2011 at 9:49 PM, Armin Ronacher <armin.ronacher@active-4.com>wrote: > Hi, > > Use Postgres + Redis :-) Postgres because it's stable and fast and > generally your data is relational. Except when it's not. When it's not > Redis is the tool for the job. > > > Regards, > Armin >
I've been using simpledb, although that ties you to amazon hosting (natural choice for me as i'm hosting on ec2). Was wondering if anyone has an opinion on redis vs simpledb (or mongo or couch vs simpledb) On Thu, Nov 3, 2011 at 8:20 PM, Arek Bochinski <zeeero.coool@gmail.com> wrote: > Redis is great and redis author, antirez, has just started a site name > lamernews.com as an example showcase. > It's source is available albeit in Ruby. > For my own needs, I go with MySQL since I know it well and can get around > the shortcomings. Last two projects, including current one, were paired with > mongo as well. I looked at the geographical query ease of use and > capability and found it to be very balanced. There is a thing about Mongo > that you need to know, and > that if you think your dataset gets above 2GB, you will want it to be on a > 64-bit system, which is a non-issue > most of the time. > > > On Thu, Nov 3, 2011 at 9:49 PM, Armin Ronacher <armin.ronacher@active-4.com> > wrote: >> >> Hi, >> >> Use Postgres + Redis :-) Postgres because it's stable and fast and >> generally your data is relational. Except when it's not. When it's not >> Redis is the tool for the job. >> >> >> Regards, >> Armin > >
I'm dealing at the moment with graph database with neo4j. This is quite amazing, it able to do very advanced queries with the query language Gremlin ( https://github.com/tinkerpop/gremlin/wiki you can watch screencasts). I wonder why we don't hear about this technology more often... Le 4 nov. 2011 20:26, "John Fries" <john.a.fries@gmail.com> a écrit : > I've been using simpledb, although that ties you to amazon hosting > (natural choice for me as i'm hosting on ec2). Was wondering if anyone > has an opinion on redis vs simpledb (or mongo or couch vs simpledb) > > On Thu, Nov 3, 2011 at 8:20 PM, Arek Bochinski <zeeero.coool@gmail.com> > wrote: > > Redis is great and redis author, antirez, has just started a site name > > lamernews.com as an example showcase. > > It's source is available albeit in Ruby. > > For my own needs, I go with MySQL since I know it well and can get around > > the shortcomings. Last two projects, including current one, were paired > with > > mongo as well. I looked at the geographical query ease of use and > > capability and found it to be very balanced. There is a thing about Mongo > > that you need to know, and > > that if you think your dataset gets above 2GB, you will want it to be on > a > > 64-bit system, which is a non-issue > > most of the time. > > > > > > On Thu, Nov 3, 2011 at 9:49 PM, Armin Ronacher < > armin.ronacher@active-4.com> > > wrote: > >> > >> Hi, > >> > >> Use Postgres + Redis :-) Postgres because it's stable and fast and > >> generally your data is relational. Except when it's not. When it's not > >> Redis is the tool for the job. > >> > >> > >> Regards, > >> Armin > > > > >
As an FYI, you can solve graph problems reasonably efficiently using recursive queries in PostgreSQL (via CTEs). In a month or two I'll be giving a talk about this and will post the notes in case people are interested*. Keeping the number of "sources of truth" in a given environment to the lowest number possible should be a primary design goal because it reduces cost, complexity and increases the rate of development. Seeing organizations fragment their knowledge base across technologies because they aren't aware of the capabilities of the tools at their disposal kills me. -sc * This is relevant because I use CTEs with SQLAlchemy to do distance vector calculations in SQL as a query and then fap through the results in Flask. Doing BGP in PostgreSQL... why? Because I can, that's why. -- Sean Chittenden On Nov 5, 2011, at 4:36, Nicolas Clairon <clairon@gmail.com> wrote: > I'm dealing at the moment with graph database with neo4j. This is quite amazing, it able to do very advanced queries with the query language Gremlin ( https://github.com/tinkerpop/gremlin/wiki you can watch screencasts). I wonder why we don't hear about this technology more often... > > Le 4 nov. 2011 20:26, "John Fries" <john.a.fries@gmail.com> a écrit : > I've been using simpledb, although that ties you to amazon hosting > (natural choice for me as i'm hosting on ec2). Was wondering if anyone > has an opinion on redis vs simpledb (or mongo or couch vs simpledb) > > On Thu, Nov 3, 2011 at 8:20 PM, Arek Bochinski <zeeero.coool@gmail.com> wrote: > > Redis is great and redis author, antirez, has just started a site name > > lamernews.com as an example showcase. > > It's source is available albeit in Ruby. > > For my own needs, I go with MySQL since I know it well and can get around > > the shortcomings. Last two projects, including current one, were paired with > > mongo as well. I looked at the geographical query ease of use and > > capability and found it to be very balanced. There is a thing about Mongo > > that you need to know, and > > that if you think your dataset gets above 2GB, you will want it to be on a > > 64-bit system, which is a non-issue > > most of the time. > > > > > > On Thu, Nov 3, 2011 at 9:49 PM, Armin Ronacher <armin.ronacher@active-4.com> > > wrote: > >> > >> Hi, > >> > >> Use Postgres + Redis :-) Postgres because it's stable and fast and > >> generally your data is relational. Except when it's not. When it's not > >> Redis is the tool for the job. > >> > >> > >> Regards, > >> Armin > > > >
> Seeing organizations > fragment their knowledge base across technologies because they aren't aware > of the capabilities of the tools at their disposal kills me. -sc Funny, what kills me is organizations trying to shoehorn everything into their existing technologies even when those technologies are woefully inadequate at handling it. RDBM's are not suitable for every task and neither are NoSQL systems. Cheers, Lars
Yeah, but you can't blame organizations trying to shoehorn everything in. Its not cheap to use new technology. At least they are trying to save money while I see a lot of organization do crazy stuff for publicity and marketing. ~Jonathan C. On Mon, Nov 7, 2011 at 11:35 AM, Lars Hansson <romabysen@gmail.com> wrote: > > Seeing organizations > > fragment their knowledge base across technologies because they aren't > aware > > of the capabilities of the tools at their disposal kills me. -sc > > Funny, what kills me is organizations trying to shoehorn everything > into their existing technologies even when those technologies are > woefully inadequate at handling it. RDBM's are not suitable for every > task and neither are NoSQL systems. > > Cheers, > Lars >
Saving money is no excuse for doing the wrong thing, especially not when it usually ends up more expensive anyway. Seems people always has an excuse ready though, no matter if they're needlessly spending on the latest whizbang or using the wrong technology for the task at hand. Cheers, Lars On Tue, Nov 8, 2011 at 3:39 AM, Jonathan Chen <tamasiaina@gmail.com> wrote: > Yeah, but you can't blame organizations trying to shoehorn everything in. > Its not cheap to use new technology. At least they are trying to save money > while I see a lot of organization do crazy stuff for publicity and > marketing. > > ~Jonathan C. > > > On Mon, Nov 7, 2011 at 11:35 AM, Lars Hansson <romabysen@gmail.com> wrote: >> >> > Seeing organizations >> > fragment their knowledge base across technologies because they aren't >> > aware >> > of the capabilities of the tools at their disposal kills me. -sc >> >> Funny, what kills me is organizations trying to shoehorn everything >> into their existing technologies even when those technologies are >> woefully inadequate at handling it. RDBM's are not suitable for every >> task and neither are NoSQL systems. >> >> Cheers, >> Lars > >
> > As an FYI, you can solve graph problems reasonably efficiently using > recursive queries in PostgreSQL (via CTEs). In a month or two I'll be > giving a talk about this and will post the notes in case people are > interested*. > I'm looking for reading|watching this talk. Keeping the number of "sources of truth" in a given environment > What do you mean by "sources of truth" ? > Seeing organizations fragment their knowledge base across technologies > because they aren't aware of the capabilities of the tools at their disposal > Still their are matching use-cases for Redis. I wonder if it is not be possible to trick Postgresql with a memory database of complex object, pl/sql queries to do what Redis does, I assume, already very well. I hope you see my point.
> Keeping the number of "sources of truth" in a given environment > > What do you mean by "sources of truth" ? An authoritative repository of data. You can always delegate truth, but never share authority of truth. You can have non-authoritative truth in a system. Memcached, MySQL, whatever, who cares. In development, I pickle out a hash, then ORM it up once I'm squared away and start using Pg. Non-authoritative sources of truth are put in place as *optimizations* or tactics to handle scalability problems *once you _need_ to scale*. But the truth of it all is, we're not doing this on P133's with 5200 rpm drives and 128MB of ram (those were entertaining days, however). Scaling vertically works very well and is the *cheapest* way to scale. Don't prematurely optimize. :-) > > Seeing organizations fragment their knowledge base across technologies because they aren't aware of the capabilities of the tools at their disposal > > Still their are matching use-cases for Redis. I wonder if it is not be possible to trick Postgresql with a memory database of complex object, pl/sql queries to do what Redis does, I assume, already very well. > > I hope you see my point. I do, and yes. See Pg's FDW/MED support. Complex objects are frequently managed natively in Pg via array's, hstore, or by creating custom data types. Custom data types kick ass, btw. Once your app is baked and you know wtf you're doing, this is a neat way to store and manage data (if array and hstore don't solve your problem). -sc -- Sean Chittenden
I was a "cassandra" uproar from users, but it seems not. Those mongodb "uncertities" are a really downside. I, too, always go with postgresql. I already toyed with DB4O and it was a good database, by all means. Sadly, or not, it seems noone uses buzhug <http://buzhug.sourceforge.net/>. 2011/11/6 Sean Chittenden <sean@chittenden.org> > > Keeping the number of "sources of truth" in a given environment >> > > What do you mean by "sources of truth" ? > > > An authoritative repository of data. You can always delegate truth, but > never share authority of truth. You can have non-authoritative truth in a > system. Memcached, MySQL, whatever, who cares. In development, I pickle out > a hash, then ORM it up once I'm squared away and start using Pg. > > Non-authoritative sources of truth are put in place as *optimizations* or > tactics to handle scalability problems *once you _need_ to scale*. > > But the truth of it all is, we're not doing this on P133's with 5200 rpm > drives and 128MB of ram (those were entertaining days, however). Scaling > vertically works very well and is the *cheapest* way to scale. Don't > prematurely optimize. :-) > > > > >> Seeing organizations fragment their knowledge base across technologies >> because they aren't aware of the capabilities of the tools at their disposal >> > > Still their are matching use-cases for Redis. I wonder if it is not be > possible to trick Postgresql with a memory database of complex object, > pl/sql queries to do what Redis does, I assume, already very well. > > I hope you see my point. > > > I do, and yes. See Pg's FDW/MED support. Complex objects are frequently > managed natively in Pg via array's, hstore, or by creating custom data > types. > > Custom data types kick ass, btw. Once your app is baked and you know wtf > you're doing, this is a neat way to store and manage data (if array and > hstore don't solve your problem). > > -sc > > -- > Sean Chittenden > -- "A arrogância é a arma dos fracos." =========================== Italo Moreira Campelo Maia Bacharel em Ciência da Computação - UECE Desenvolvedor WEB e Desktop (Java, Python, Lua) Coordenador do Pug-CE ----------------------------------------------------- http://www.italomaia.com/ http://twitter.com/italomaia/ http://eusouolobomau.blogspot.com/ ----------------------------------------------------- Turtle Linux 9.10 - http://tiny.cc/blogturtle910 Turtle Linux 10.10 - http://bit.ly/cEw4ET ===========================
*I was expecting 2011/11/6 Italo Maia <italo.maia@gmail.com> > I was a "cassandra" uproar from users, but it seems not. Those mongodb > "uncertities" are a really downside. I, too, always go with postgresql. I > already toyed with DB4O and it was a good database, by all means. Sadly, or > not, it seems noone uses buzhug <http://buzhug.sourceforge.net/>. > > > 2011/11/6 Sean Chittenden <sean@chittenden.org> > >> >> Keeping the number of "sources of truth" in a given environment >>> >> >> What do you mean by "sources of truth" ? >> >> >> An authoritative repository of data. You can always delegate truth, but >> never share authority of truth. You can have non-authoritative truth in a >> system. Memcached, MySQL, whatever, who cares. In development, I pickle out >> a hash, then ORM it up once I'm squared away and start using Pg. >> >> Non-authoritative sources of truth are put in place as *optimizations* or >> tactics to handle scalability problems *once you _need_ to scale*. >> >> But the truth of it all is, we're not doing this on P133's with 5200 rpm >> drives and 128MB of ram (those were entertaining days, however). Scaling >> vertically works very well and is the *cheapest* way to scale. Don't >> prematurely optimize. :-) >> >> >> >> >>> Seeing organizations fragment their knowledge base across technologies >>> because they aren't aware of the capabilities of the tools at their disposal >>> >> >> Still their are matching use-cases for Redis. I wonder if it is not be >> possible to trick Postgresql with a memory database of complex object, >> pl/sql queries to do what Redis does, I assume, already very well. >> >> I hope you see my point. >> >> >> I do, and yes. See Pg's FDW/MED support. Complex objects are frequently >> managed natively in Pg via array's, hstore, or by creating custom data >> types. >> >> Custom data types kick ass, btw. Once your app is baked and you know wtf >> you're doing, this is a neat way to store and manage data (if array and >> hstore don't solve your problem). >> >> -sc >> >> -- >> Sean Chittenden >> > > > > -- > "A arrogância é a arma dos fracos." > > =========================== > Italo Moreira Campelo Maia > Bacharel em Ciência da Computação - UECE > Desenvolvedor WEB e Desktop (Java, Python, Lua) > Coordenador do Pug-CE > ----------------------------------------------------- > http://www.italomaia.com/ > http://twitter.com/italomaia/ > http://eusouolobomau.blogspot.com/ > ----------------------------------------------------- > Turtle Linux 9.10 - http://tiny.cc/blogturtle910 > Turtle Linux 10.10 - http://bit.ly/cEw4ET > =========================== > -- "A arrogância é a arma dos fracos." =========================== Italo Moreira Campelo Maia Bacharel em Ciência da Computação - UECE Desenvolvedor WEB e Desktop (Java, Python, Lua) Coordenador do Pug-CE ----------------------------------------------------- http://www.italomaia.com/ http://twitter.com/italomaia/ http://eusouolobomau.blogspot.com/ ----------------------------------------------------- Turtle Linux 9.10 - http://tiny.cc/blogturtle910 Turtle Linux 10.10 - http://bit.ly/cEw4ET ===========================
I've never heard of buzhug. It looks pretty nice though. Actually, looking through the examples, it seems to bring to Python what LINQ brings to .NET in terms of querying data. (Though not exactly, since LINQ will work over * any* data set and is not tied to a particular database). I think I'll be playing with this in the near future. Anyone have any experience with it? On Sun, Nov 6, 2011 at 3:30 PM, Italo Maia <italo.maia@gmail.com> wrote: > *I was expecting > > > 2011/11/6 Italo Maia <italo.maia@gmail.com> > >> I was a "cassandra" uproar from users, but it seems not. Those mongodb >> "uncertities" are a really downside. I, too, always go with postgresql. I >> already toyed with DB4O and it was a good database, by all means. Sadly, or >> not, it seems noone uses buzhug <http://buzhug.sourceforge.net/>. >> >> >> 2011/11/6 Sean Chittenden <sean@chittenden.org> >> >>> >>> Keeping the number of "sources of truth" in a given environment >>>> >>> >>> What do you mean by "sources of truth" ? >>> >>> >>> An authoritative repository of data. You can always delegate truth, but >>> never share authority of truth. You can have non-authoritative truth in a >>> system. Memcached, MySQL, whatever, who cares. In development, I pickle out >>> a hash, then ORM it up once I'm squared away and start using Pg. >>> >>> Non-authoritative sources of truth are put in place as *optimizations* >>> or tactics to handle scalability problems *once you _need_ to scale*. >>> >>> But the truth of it all is, we're not doing this on P133's with 5200 rpm >>> drives and 128MB of ram (those were entertaining days, however). Scaling >>> vertically works very well and is the *cheapest* way to scale. Don't >>> prematurely optimize. :-) >>> >>> >>> >>> >>>> Seeing organizations fragment their knowledge base across technologies >>>> because they aren't aware of the capabilities of the tools at their disposal >>>> >>> >>> Still their are matching use-cases for Redis. I wonder if it is not be >>> possible to trick Postgresql with a memory database of complex object, >>> pl/sql queries to do what Redis does, I assume, already very well. >>> >>> I hope you see my point. >>> >>> >>> I do, and yes. See Pg's FDW/MED support. Complex objects are frequently >>> managed natively in Pg via array's, hstore, or by creating custom data >>> types. >>> >>> Custom data types kick ass, btw. Once your app is baked and you know wtf >>> you're doing, this is a neat way to store and manage data (if array and >>> hstore don't solve your problem). >>> >>> -sc >>> >>> -- >>> Sean Chittenden >>> >> >> >> >> -- >> "A arrogância é a arma dos fracos." >> >> =========================== >> Italo Moreira Campelo Maia >> Bacharel em Ciência da Computação - UECE >> Desenvolvedor WEB e Desktop (Java, Python, Lua) >> Coordenador do Pug-CE >> ----------------------------------------------------- >> http://www.italomaia.com/ >> http://twitter.com/italomaia/ >> http://eusouolobomau.blogspot.com/ >> ----------------------------------------------------- >> Turtle Linux 9.10 - http://tiny.cc/blogturtle910 >> Turtle Linux 10.10 - http://bit.ly/cEw4ET >> =========================== >> > > > > -- > "A arrogância é a arma dos fracos." > > =========================== > Italo Moreira Campelo Maia > Bacharel em Ciência da Computação - UECE > Desenvolvedor WEB e Desktop (Java, Python, Lua) > Coordenador do Pug-CE > ----------------------------------------------------- > http://www.italomaia.com/ > http://twitter.com/italomaia/ > http://eusouolobomau.blogspot.com/ > ----------------------------------------------------- > Turtle Linux 9.10 - http://tiny.cc/blogturtle910 > Turtle Linux 10.10 - http://bit.ly/cEw4ET > =========================== >
Our use cases and data model more or less aligned with non-relational structure, which makes MongoDB a good candidate, and now, we uses it for both data and queue (via Celery), so far, so perfect. Its better to understand cons of NoSQL and see if its fits your models/requirements, before diving. On Wed, Nov 2, 2011 at 6:44 AM, Jonathan Chen <tamasiaina@gmail.com> wrote: > Hey guys, > > So I am starting to build up an app for project of mine. I think the big > hype lately is the NoSQL stuff that is going around like Cassandra, > MongoDB, Redis, etc. What NoSQL databases have you guys been using and what > for? I am thinking of using Mysql + Redis for a future project that I am > working on. Where MySQL will contain most of the data and redis will be > used for message queuing, caching, global variables, etc. I am still > concerned with NoSQL security. But anyways what NoSQL implementations have > you guys used for your flask apps? > > ~Jonathan C. >
MongoDB and CouchDB are the big players. We use Mongo. Mongo has a pretty unbiased comparison: http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB Craig Younkins On Tue, Nov 1, 2011 at 8:00 PM, Soj <sojin.v@gmail.com> wrote: > Our use cases and data model more or less aligned with non-relational > structure, which makes MongoDB a good candidate, and now, we uses it for > both data and queue (via Celery), so far, so perfect. > Its better to understand cons of NoSQL and see if its fits your > models/requirements, before diving. > > > On Wed, Nov 2, 2011 at 6:44 AM, Jonathan Chen <tamasiaina@gmail.com>wrote: > >> Hey guys, >> >> So I am starting to build up an app for project of mine. I think the big >> hype lately is the NoSQL stuff that is going around like Cassandra, >> MongoDB, Redis, etc. What NoSQL databases have you guys been using and what >> for? I am thinking of using Mysql + Redis for a future project that I am >> working on. Where MySQL will contain most of the data and redis will be >> used for message queuing, caching, global variables, etc. I am still >> concerned with NoSQL security. But anyways what NoSQL implementations have >> you guys used for your flask apps? >> >> ~Jonathan C. >> > >
Here's a pretty good breakdown of the big NoSQL players in the market. http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis http://perfectmarket.com/blog/not_only_nosql_review_solution_evaluation_guide_chart However, it really comes down to what you are using NoSQL for. Different solutions fit different needs. I am currently using MongoDB, and I am really happy with the flexibility and performance. On Tue, Nov 1, 2011 at 6:46 PM, Craig Younkins <cyounkins@gmail.com> wrote: > MongoDB and CouchDB are the big players. We use Mongo. > > Mongo has a pretty unbiased comparison: > http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB > > Craig Younkins > > > > On Tue, Nov 1, 2011 at 8:00 PM, Soj <sojin.v@gmail.com> wrote: > >> Our use cases and data model more or less aligned with non-relational >> structure, which makes MongoDB a good candidate, and now, we uses it for >> both data and queue (via Celery), so far, so perfect. >> Its better to understand cons of NoSQL and see if its fits your >> models/requirements, before diving. >> >> >> On Wed, Nov 2, 2011 at 6:44 AM, Jonathan Chen <tamasiaina@gmail.com>wrote: >> >>> Hey guys, >>> >>> So I am starting to build up an app for project of mine. I think the big >>> hype lately is the NoSQL stuff that is going around like Cassandra, >>> MongoDB, Redis, etc. What NoSQL databases have you guys been using and what >>> for? I am thinking of using Mysql + Redis for a future project that I am >>> working on. Where MySQL will contain most of the data and redis will be >>> used for message queuing, caching, global variables, etc. I am still >>> concerned with NoSQL security. But anyways what NoSQL implementations have >>> you guys used for your flask apps? >>> >>> ~Jonathan C. >>> >> >> >
From what I read about MongoDB is that once you actually get a good size load of data into the DB that it gets large quite fast. ~Jonathan C. On Tue, Nov 1, 2011 at 6:54 PM, Cheng-Han Lee <lee.chenghan@gmail.com>wrote: > Here's a pretty good breakdown of the big NoSQL players in the market. > > http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis > > http://perfectmarket.com/blog/not_only_nosql_review_solution_evaluation_guide_chart > > However, it really comes down to what you are using NoSQL for. Different > solutions fit different needs. I am currently using MongoDB, and I am > really happy with the flexibility and performance. > > > > On Tue, Nov 1, 2011 at 6:46 PM, Craig Younkins <cyounkins@gmail.com>wrote: > >> MongoDB and CouchDB are the big players. We use Mongo. >> >> Mongo has a pretty unbiased comparison: >> http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB >> >> Craig Younkins >> >> >> >> On Tue, Nov 1, 2011 at 8:00 PM, Soj <sojin.v@gmail.com> wrote: >> >>> Our use cases and data model more or less aligned with non-relational >>> structure, which makes MongoDB a good candidate, and now, we uses it for >>> both data and queue (via Celery), so far, so perfect. >>> Its better to understand cons of NoSQL and see if its fits your >>> models/requirements, before diving. >>> >>> >>> On Wed, Nov 2, 2011 at 6:44 AM, Jonathan Chen <tamasiaina@gmail.com>wrote: >>> >>>> Hey guys, >>>> >>>> So I am starting to build up an app for project of mine. I think the >>>> big hype lately is the NoSQL stuff that is going around like Cassandra, >>>> MongoDB, Redis, etc. What NoSQL databases have you guys been using and what >>>> for? I am thinking of using Mysql + Redis for a future project that I am >>>> working on. Where MySQL will contain most of the data and redis will be >>>> used for message queuing, caching, global variables, etc. I am still >>>> concerned with NoSQL security. But anyways what NoSQL implementations have >>>> you guys used for your flask apps? >>>> >>>> ~Jonathan C. >>>> >>> >>> >> >
Hi, What about ZODB? Currently I use it. But I am not sure about its scalability. And the documentation is not that thorough. Thanks! 2011/11/2 Craig Younkins <cyounkins@gmail.com> > MongoDB and CouchDB are the big players. We use Mongo. > > Mongo has a pretty unbiased comparison: > http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB > > Craig Younkins > > > > On Tue, Nov 1, 2011 at 8:00 PM, Soj <sojin.v@gmail.com> wrote: > >> Our use cases and data model more or less aligned with non-relational >> structure, which makes MongoDB a good candidate, and now, we uses it for >> both data and queue (via Celery), so far, so perfect. >> Its better to understand cons of NoSQL and see if its fits your >> models/requirements, before diving. >> >> >> On Wed, Nov 2, 2011 at 6:44 AM, Jonathan Chen <tamasiaina@gmail.com>wrote: >> >>> Hey guys, >>> >>> So I am starting to build up an app for project of mine. I think the big >>> hype lately is the NoSQL stuff that is going around like Cassandra, >>> MongoDB, Redis, etc. What NoSQL databases have you guys been using and what >>> for? I am thinking of using Mysql + Redis for a future project that I am >>> working on. Where MySQL will contain most of the data and redis will be >>> used for message queuing, caching, global variables, etc. I am still >>> concerned with NoSQL security. But anyways what NoSQL implementations have >>> you guys used for your flask apps? >>> >>> ~Jonathan C. >>> >> >> > -- *Yi-Xin Liu, PHD* *Department of Macromolecular Science* *Fudan University* *Room 415, Yuejing Building * *Handan Rd. 220, **Shanghai, China* *Tel +86-021-65642863* *Mobile +86-13916819745* http://www.mendeley.com/profiles/yi-xin-liu/
PostgreSQL with CTE's and ARRAY[]'s. Skip the NoSQL movement until it starts to resemble the maturity of the OODB universe (that's existed since the 80s, but *shock* imagine that!, no one uses outside of a few good use-cases, and with good reason). In the mean time, here's a healthy and mandatory dose of debunking the NoSQL performance FUD: http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html 225k *TPS* not QPS. -sc > So I am starting to build up an app for project of mine. I think the big hype lately is the NoSQL stuff that is going around like Cassandra, MongoDB, Redis, etc. What NoSQL databases have you guys been using and what for? I am thinking of using Mysql + Redis for a future project that I am working on. Where MySQL will contain most of the data and redis will be used for message queuing, caching, global variables, etc. I am still concerned with NoSQL security. But anyways what NoSQL implementations have you guys used for your flask apps? > > ~Jonathan C. -- Sean Chittenden sean@chittenden.org
We use MongoDB for the main data and Redis for the queue backend. It's working well so far. On Tue, Nov 1, 2011 at 6:53 PM, Sean Chittenden <sean@chittenden.org> wrote: > PostgreSQL with CTE's and ARRAY[]'s. Skip the NoSQL movement until it > starts to resemble the maturity of the OODB universe (that's existed since > the 80s, but *shock* imagine that!, no one uses outside of a few good > use-cases, and with good reason). In the mean time, here's a healthy and > mandatory dose of debunking the NoSQL performance FUD: > > > http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html > > 225k *TPS* not QPS. -sc > > > > So I am starting to build up an app for project of mine. I think the big > hype lately is the NoSQL stuff that is going around like Cassandra, > MongoDB, Redis, etc. What NoSQL databases have you guys been u sing and > what for? I am thinking of using Mysql + Redis for a future project that I > am working on. Where MySQL will contain most of the data and redis will be > used for message queuing, caching, global variables, etc. I am still > concerned with NoSQL security. But anyways what NoSQL implementations have > you guys used for your flask apps? > > ~Jonathan C. > > > > > > -- > Sean Chittenden > sean@chittenden.org > >