28 Oct, 2008, Grimble wrote in the 1st comment:
Votes: 0
I'm curious as to why some MUDs choose a DB to store their world data.

I've gone with the traditional flat-file approach, using XML and the Xerces parser. Boost has a couple of other viable options like the property_tree and serialization libraries. I guess I like the idea of the data being human-readable, and that it can be updated with any simple editor.

Other than the customizable search capabilities offered by a DB, what are the advantages?

-Grimble
28 Oct, 2008, Zeno wrote in the 2nd comment:
Votes: 0
Let's say I wanted to integrate my MUD into my forums/webpage. MySQL would make that easier.

Or if I wanted to connect two MUDs. MySQL would help there too.
28 Oct, 2008, David Haley wrote in the 3rd comment:
Votes: 0
Databases also typically offer some kind of reliability. For instance if something crashes halfway through a write, a (good) database implementation won't be thrown into some weird inconsistent state.
28 Oct, 2008, Kelvin wrote in the 4th comment:
Votes: 0
Flatfiles can get unwieldy really quick as they grow in size. My game tracks just about every statistic imaginable dating back to its first day five years and three months ago. We use a lot of the old logged info to generate stats that are used for display or for calculating things to do with the econ or factions. Looking back on five years of data from a flatfile can get to be really unwieldy really quick.

For us, the question the exact opposite: Why on earth would we want to use flatfiles?
28 Oct, 2008, Kelvin wrote in the 5th comment:
Votes: 0
Grimble said:
I guess I like the idea of the data being human-readable, and that it can be updated with any simple editor.

As an addendum, you can edit stuff in a DB with the help of any number of simple administrative utilities. phpMyAdmin in the case of MySQL, or pgAdmin in the case of postgres. SQLite has tons of lightweight apps like this as well (SQLite being an excellent choice for MUDs).
28 Oct, 2008, David Haley wrote in the 6th comment:
Votes: 0
Kelvin said:
For us, the question the exact opposite: Why on earth would we want to use flatfiles?

Well, I wouldn't go that far. Flat files are much simpler and never require server processes etc. to run (unlike some databases). Flat files get a bad rap because most people use them without proper support libraries and treat them as "flat" in the code as well as on the disk. A database, especially one such as SQLite, is after all absolutely nothing more than an interface on top of a flat file…
28 Oct, 2008, Grimble wrote in the 7th comment:
Votes: 0
DavidHaley said:
Databases also typically offer some kind of reliability. For instance if something crashes halfway through a write, a (good) database implementation won't be thrown into some weird inconsistent state.


I'm not as concerned about this given a validation pass of the file (with an XML schema in my case) and a two-step save process where the data is written to a new file before the old file is replaced with it. But yes, I've had to implement this whereas it's typically a built-in capability for a DB.
28 Oct, 2008, Grimble wrote in the 8th comment:
Votes: 0
Kelvin said:
Flatfiles can get unwieldy really quick as they grow in size. My game tracks just about every statistic imaginable dating back to its first day five years and three months ago. We use a lot of the old logged info to generate stats that are used for display or for calculating things to do with the econ or factions. Looking back on five years of data from a flatfile can get to be really unwieldy really quick.


To some extent, I think this depends on how you're storing your world across flat files. The traditional approach of a single file per zone that specifies every room, exit, mob, object, script, etc is definitely a burden. The opposite would be to put every object into its own file, and potentially even having the directory structure mirror the object hierarchy.

But yes, if you're recording years worth of statistical data per object, I'd be inclined to dump that into an associated file rather than the object file itself.

As an aside, how adaptive is a DB to object versioning? At the top of every file, a version field can be used to determine how the rest of the file needs to be parsed. What do you do if your world is in a DB? This may be another built-in capability of a DB, but I'm not sure about that.
28 Oct, 2008, David Haley wrote in the 9th comment:
Votes: 0
XML schema validation only helps detect problems as you encounter them during reading; I was referring to not having the problem show up in the first place by being extremely careful with how/when you save data. But it looks like you've addressed that as well with your two-step save.
28 Oct, 2008, David Haley wrote in the 10th comment:
Votes: 0
Grimble said:
As an aside, how adaptive is a DB to object versioning? At the top of every file, a version field can be used to determine how the rest of the file needs to be parsed. What do you do if your world is in a DB? This may be another built-in capability of a DB, but I'm not sure about that.

You can change a DB schema over time, although it is somewhat harder to do. It's not that hard to add columns, but it's hard to remove them or change what they mean depending on when the row was written.

Grimble said:
The traditional approach of a single file per zone that specifies every room, exit, mob, object, script, etc is definitely a burden

The "traditional approach" is mainly to have somewhat crummy reading/in-memory storage routines. The file format is almost incidental; the real pain comes because the file is not read into a representational format before being converted to whatever end-result data structure it's meant for.
28 Oct, 2008, elanthis wrote in the 11th comment:
Votes: 0
DavidHaley said:
Databases also typically offer some kind of reliability. For instance if something crashes halfway through a write, a (good) database implementation won't be thrown into some weird inconsistent state.


Neither would a good file-based implementation.

For single file writes (where consistency with the data in other files is not critical), the usual approach is to write to a temporary file, flush the output before closing, then using rename() to replace the original file.

When you have multiple files, it's a little trickier, but not impossible. Create a temporary directory of all files with updates, rename() the old directory out of the way into a temp location, rename() the new directory into place, and then delete the old directory. If a crash happens during this process, the app can detect it on startup and recover the old directory from the temp location.

Grimble said:
Other than the customizable search capabilities offered by a DB, what are the advantages?


The biggest advantage is being able to access and search data with 100% accuracy that isn't in active use. For example, if use flat files and you want to get stats about how many long_sword objects are in existence, you need to deal with the problem of player inventories. You don't want to keep every player loaded into memory at all times, but you don't want to have to read and parse every player file each time you need that statistic. You end up needing to create caches and worry about keeping them up-to-date. With a DB, you can just do a single query.

I end up using flat files just because SQL is such a pain in the butt to work with for things like a MUD. If your code uses objects to represent data, you either need a good ORM or you end up using the DB as just a storage mechanism and then you lose out on the real-time statistics features. On the other hand, applications that try to keep all the data in the DB and using individualized SQL queries everywhere for data manipulation tend to get unwieldly and unmaintainable pretty fast.

If you're developing in C or C++, there aren't many good ORM solutions out there. Python and Ruby both have some excellent ones, so if you're lucky enough to be developing in one of those languages, using a DB might make far more sense.
28 Oct, 2008, David Haley wrote in the 12th comment:
Votes: 0
elanthis said:
Neither would a good file-based implementation. (…)

The difference is that the DBMS will handle this concurrently, with reads and writes from several processes. Life gets harder with (ye olde average) flat file implementations. Obviously not impossible, but the DBMS provides it for you whereas you have to do it yourself if you opt for your own file format.

elanthis said:
You end up needing to create caches and worry about keeping them up-to-date. With a DB, you can just do a single query.

I'm assuming that you mean that the database is providing this functionality for you. It's either keeping things in memory all the time (cached or not) or loading them up on demand.

This is what I mean about the problem with flat files being crummy interfaces rather than the storage mechanism. With a good interface, everything that goes on under the hood can be kept hidden away – just like with a database, actually. If you think about it, the more work you do here, the closer you're getting to a database (of sorts) anyhow. As I said earlier a database is nothing more than some interface to accessing data stored in some manner you typically don't care about…

elanthis said:
On the other hand, applications that try to keep all the data in the DB and using individualized SQL queries everywhere for data manipulation tend to get unwieldly and unmaintainable pretty fast.

Not to mention that you'll get killed performance-wise once you start doing anything particularly interesting. You eventually need some kind of direct-access memory cache instead of having to make a look-up every single time.

EDIT: perhaps part of what you mean by "good" ORM is that it would take care of this caching for you, but I wasn't sure what you meant from just the above sentence.
28 Oct, 2008, Grimble wrote in the 13th comment:
Votes: 0
elanthis said:
The biggest advantage is being able to access and search data with 100% accuracy that isn't in active use. For example, if use flat files and you want to get stats about how many long_sword objects are in existence, you need to deal with the problem of player inventories. You don't want to keep every player loaded into memory at all times, but you don't want to have to read and parse every player file each time you need that statistic. You end up needing to create caches and worry about keeping them up-to-date. With a DB, you can just do a single query.


Yes, but won't the DB itself end up doing something along the lines of what you've described? That is, with potentially thousands of player files, it's hard to see how the DB is going to instantly return a statistic like the one described above.

It is indeed a benefit that a DB has visibility into all objects, whether currently active or not, but I do think inactive objects are largely limited to offline players. And even with a DB, player related statistics may be most efficiently served by making a pass through the player objects during MUD initialization, and then maintaining those statistics from that point on.
28 Oct, 2008, David Haley wrote in the 14th comment:
Votes: 0
Grimble said:
It is indeed a benefit that a DB has visibility into all objects

That's the thing, though: you could just as easily do this with your own index implementation over flat files. This isn't a DB-specific feature.
28 Oct, 2008, Runter wrote in the 15th comment:
Votes: 0
To put my 2 cents in, databases are nice. I've tended to use SQL based technology in recent times.

However, flat files do have their place. I think the most common misconceptions in the "flat file" vs db is that you have to use one or the other and that you have to save data the way most stock muds do. It's quite possible to implement a simple databasing system into a "flatfile" to meet many of your needs already mentioned. More work, yes. And I'm all about not doing more work. (But probably not as much work as people would imagine.)
28 Oct, 2008, Vassi wrote in the 16th comment:
Votes: 0
Grimble said:
elanthis said:
The biggest advantage is being able to access and search data with 100% accuracy that isn't in active use. For example, if use flat files and you want to get stats about how many long_sword objects are in existence, you need to deal with the problem of player inventories. You don't want to keep every player loaded into memory at all times, but you don't want to have to read and parse every player file each time you need that statistic. You end up needing to create caches and worry about keeping them up-to-date. With a DB, you can just do a single query.


Yes, but won't the DB itself end up doing something along the lines of what you've described? That is, with potentially thousands of player files, it's hard to see how the DB is going to instantly return a statistic like the one described above.

It is indeed a benefit that a DB has visibility into all objects, whether currently active or not, but I do think inactive objects are largely limited to offline players. And even with a DB, player related statistics may be most efficiently served by making a pass through the player objects during MUD initialization, and then maintaining those statistics from that point on.


I think you're looking at it from the wrong point of view. You're asking 'what can DBs do that flatfile can't?' and the answer is nothing, really. If you already have a beefy flat-file system in place you're only looking at slightly more convenience on the search and store side. A database can let you split up your data more elegantly (read-nightmarishly) and then retrieve it later using joins or views. Even so, unless you have an ORM, you're looking to recreate a good deal of your object access code and - to be honest - if you don't already know SQL then the chances of you creating terribly inefficient SQL queries (such as loading all of your objects in a for-loop with one query per object) are pretty high.

One thing I'm glad I didn't see in this thread was performance time. With a good DB and connection pooling it is hardly worth considering on a decent machine, that is assuming you're using good queries.

Edit: A flat-file system can still be exposed as a service to most websites and stuff, so even there a DB isn't necessary. Of course, that does mean you need a mini http server or you need to handle a special kind of socket connection, etc.
28 Oct, 2008, quixadhal wrote in the 17th comment:
Votes: 0
The big win with an SQL database is the ability to do reporting against the live data. There are other benefits, but it depends on how your program your storage code. It's usually easier to read/write live state changes to a database than a flat file, unless you come up with a scheme for indexing into the files – and then you may as well use BerkeleyDB or something similar. There are also costs, you have to consider how much your game driver wants to cache reads and writes, vs. how much the database can or will do so…. the more you cache, the faster you go… but… the less the database is a live reflection of game state.

I think the query engine is the biggest win though, as it allows you to ask questions like "How many players have killed a mob this week that's given them more than 1/2 a level of experience?", or "Show me the mobs that have died the fastest, sorted by descending level". Obviously, you have to log data to support those queries (combat length, rewards, etc), but you'd still have to do that with flat files… you just also have to write a log parser and analyzer to ask the same questions.
28 Oct, 2008, David Haley wrote in the 18th comment:
Votes: 0
I think that the summary here is that you can do anything with both systems, although if you want to do data-related things like queries, doing it yourself means reimplementing much of what the big database systems already provide. In other words, the more you want to do with your flat file format, the more it will start looking like a database implementation.

I think the best solution is to figure out what you want to do with your data, and only then start evaluating database systems, flat file formats, etc.

But if you do go with a flat file format (which I would probably do (see below)), I still strongly recommend wrapping it in sane interfaces.

I would go with a flat file because I don't need real-time database records. If/when I wanted to do analysis with queries, it would be extremely easy to write a script to dump log entries into a table somewhere. I could even do that on the fly from the game if needed. This way I get the flexibility of controlling my interface instead of having to talk to something else. I think a full DB server is a rather heavy requirement to add on to a MUD.
28 Oct, 2008, elanthis wrote in the 19th comment:
Votes: 0
DavidHaley said:
elanthis said:
Neither would a good file-based implementation. (…)

The difference is that the DBMS will handle this concurrently, with reads and writes from several processes. Life gets harder with (ye olde average) flat file implementations. Obviously not impossible, but the DBMS provides it for you whereas you have to do it yourself if you opt for your own file format.


That is assuming that you're using a networked DBMS like that. There are plenty of DBMS that don't offer any concurrency, since they're nothing more than a fancy interface for modifying trees of data stored in a binary file. Quite a few Java DBMS libraries – as well as the ever-popular SQLite – that have no separate server component.

DavidHaley said:
I'm assuming that you mean that the database is providing this functionality for you. It's either keeping things in memory all the time (cached or not) or loading them up on demand.


Not necessarily. An RDBMS can make quite a few interesting optimizations with your data that are not easily done using a custom flat file approach, such as indexing only the portions of data that you are frequently querying. This is in part accomplished using a query cache, partly by using indexes, and partly by table structure.

If you wanted to get a list of all items of the "long_sword" instance with flat files, you might need to load up and parse eveyr player file. With a DBMS, you could just query the items table (and probably using an index).

DavidHaley said:
Not to mention that you'll get killed performance-wise once you start doing anything particularly interesting. You eventually need some kind of direct-access memory cache instead of having to make a look-up every single time.


I tend to avoid the performance claims like that, given how meaningless they can be. Given the average size of a MUD these days, performance just isn't going to be an issue in that scenario, assuming properly optimized queries are used along with proper indexes. Your DBMS might actually even just be an in-memory representation – SQLite can do that, as can several other DBMS libraries. One might as well claim that using Ruby or Python is going to be a performance problem instead of just using C.

Grimble said:
Yes, but won't the DB itself end up doing something along the lines of what you've described? That is, with potentially thousands of player files, it's hard to see how the DB is going to instantly return a statistic like the one described above.


It won't be instant. It'll just be significantly faster than loading and parsing several hundred or several thousand files, or iterating over a list of live objects.

Grimble said:
It is indeed a benefit that a DB has visibility into all objects, whether currently active or not, but I do think inactive objects are largely limited to offline players. And even with a DB, player related statistics may be most efficiently served by making a pass through the player objects during MUD initialization, and then maintaining those statistics from that point on.


Which would be the index maintenance I mentioned before. :)

Another area (mentioned by others already) where you have large amounts of "offline" data would be logs. It can be pretty handy to track object history, for example, or chat logs, or whatever. It's nice to be able to get very fast summaries of that data in a variety of formats. For example, maybe you want to log every chat line from every player, denote which room the chat line was spoken in, which other players received/heard that chat line, etc. if a player files an abuse report against another player, you can then do a (for all practical purposes) instantaneous query to get every chat line said by the reported player to/around the reporting player within a certain time frame.
28 Oct, 2008, quixadhal wrote in the 20th comment:
Votes: 0
Yes, it's very hard to do any sort of performance predictions for databases, not just because a relational database doesn't work like a set of flat files, but because RDBMS do all kinds of things in the background to optimize performance for different situtations. Some databases already do in-memory caching of index or row elements that are accessed frequently. Some can "pin" tables or indexes to memory as the user specifies the need. Almost every RDBMS will store transactions in memory and flush them out to table after a commit, as the system has a chance to do so.

If you have a SQL system, try using the EXPLAIN keyword on various queries. I've worked with some pretty complex systems in the past, and database layout combined with query structure makes a HUGE difference in performance when you get up in the millions of rows.

Quote
wiley=> explain select * from log_today where log_type = 'KILL' limit 40;
QUERY PLAN
——————————————————————————————————————
Limit (cost=12.81..12.82 rows=1 width=96)
-> Subquery Scan log_today (cost=12.81..12.82 rows=1 width=96)
-> Sort (cost=12.81..12.81 rows=1 width=75)
Sort Key: logfile.log_date
-> Nested Loop (cost=0.00..12.80 rows=1 width=75)
Join Filter: ("inner".log_type_id = "outer".log_type_id)
-> Index Scan using log_types_name_key on log_types (cost=0.00..5.82 rows=1 width=36)
Index Cond: (name = 'KILL'::text)
-> Index Scan Backward using ix_logfile_date on logfile (cost=0.00..6.77 rows=16 width=47)
Index Cond: (log_date > (now() - '1 day'::interval))
0.0/25