Touhou-Project.com

Standard Evolution

Added 2024-07-15 00:35:53 +0000 UTC

Hello all, hope you’ve been well. In this quite overdue installment of my usual ramblings I’ll be talking about some of the things I’ve been up to in the past few months and the changes on the site.

While there’s nothing too obvious to the average user, there’s been quite a few bugfixes and refinement of implementations of existing features. Things like file deletion are handled in a cleaner way and user-facing things like relative time stamps (ie: post was x minutes/hours/decades ago) ought to give more consistent results in most scenarios. Likewise, the RSS generation has been tightened up and is more accurate to how content is rendered on the boards and, in addition, the feeds themselves hold more posts than the previous low limit—if you subscribe to a feed or don’t update it in a while, you should be presented with a more comprehensive view of posting activity on the site.

I’ve also moved away from some of the more aggressive analytics, that were dependent on client-side scripts. This has always been somewhat inaccurate due to the proliferation of things like adblock and I’ve long-since used logfile analysis to complement it. The log-based approach isn’t exactly accurate when it comes to some data (like how long a user spends on a specific page) but they are more thorough and, if properly filtered and normalized, are more comprehensive. This move isn’t really because of ideological reasons (I do think there’s too much tracking on the internet) but mainly because I couldn’t be bothered setting up the script to be served on most pages anymore, nor keeping other factors like GDPR compliance in mind for what should mainly be a quick overview of activity.

In that spirit, some of the moderation actions triggered by script have also been simplified and moderators and lesser-privileged users are requesting more specific data that then’s parsed by their browser, marginally decreasing the computational workload to both server and user. Messing about with those bits of code, refactoring it, also opened up the possibility to future additions in terms of data and there's still a few work-in-progress things I’ve been trialing on my end that haven’t yet pushed to the live version of the site. The overall idea remains making it easier, at a glance, for the staff to moderate posts and discern possible malicious patterns of activity. Likewise, it’s in that spirit that some of the automatic bans have been honed and even less human input is required to deal with that low-hanging fruit. A helpful consequence of all this work was also the hardening up of potential security issues to the database (which were very situational and unlikely but, you know, you can never be too careful.) Better, more consistent, messages in the moderation log were also a small consequence of this work.

In this spirit of cleanup of regeneration, a few of the unused parts of the code were reorganized or outright removed. An example of the latter would be SVG image processing support—which has been virtually unused over the years and has been broken—and checks in the code would still account for it even in situations where it was disabled; this goes some way towards helping in an eventual file processing overhaul and the ability to upload multiple files in a post at once.

As for the former, I bothered to apply a consistent house style to the code base. You see, even when the original authors of the software wrote and contributed to it, there were often snippets or places where the style was different to the rest of the program. Some of this was due to those parts being “borrowed” from other projects and more or less copy and pasted, others due to contributors just adding them without someone making them conform to an overall style. This means not only was nomenclature was all over the place, but casing was inconsistent and references in other parts of the code could look strange. I haven’t done an overly-zealous job, like adding descriptions and example parameters to functions, but I have made at least names and casing follow a consistent style of category. Additionally, a lot of the reused functions have been reorganized on a file level, with some of these internal libraries removed or consolidated as appropriate. There is a better organization that’s possible but, seeing I intend on overhauling more of the basic functions in the future, I’ll be doing more as I go along.

A brief aside: I’ve also removed most of the HTML minification because of upstream problems with the library I used and also because it was too resource-intensive under some circumstances. That increases load times a little but not too terribly and it’s something that will be revisited when I do bigger changes to the user-facing parts of the site.

It’s on the note of changing user-facing parts of the site that I’ll wax slightly more philosophically. As a site that hosts fiction, reading that fiction in clearer and more comfortable ways is important; the motivation behind the ‘limit post width’ user option (internal code refers it to “magmode” as in magazine) is to make comprehension easier and to bring the display more along the lines of something like a news site or an online magazine. Basically, the idea is that limited column width is more natural and less overwhelming. If you look at most sites that deal with a lot of text—or actual print books—you’ll see that this principle is applied and that most lines contain around 80-100 characters. The original imageboard format didn’t recognize this and, indeed, was not meant for copious amounts of text. As THP evolves, a larger overhaul will be more mindful of that principle even if it retains imageboard-like characteristics.

In other words, it’ll be the de facto standard. But we’re still a ways away, with this option being opt-in. The mention of all this serves as a circuitous way of saying that this feature has gotten a tweak as well. It more uniformly hews to the aforementioned range at a larger range of screen and font sizes and, additionally, displays a more prominent left-placement bias to the text. Some people may prefer centered (and that may well be the case in a larger overhaul) but as English (and Western languages, generally) are left-to-right our eyes are accustomed to starting from a left margin to see content. I think that we’re currently at a good default (I’ve used magmode to browse the site for years) and it’s a good stepping stone for considering future design decisions; a variable gap to either side allows for “floating” content like post boxes or watched threads to exist alongside content as well and is a nice bonus.

And, speaking of parsing text, I’ve buried the lede somewhat. One of the biggest changes I’ve made has been to how posts themselves are parsed and rendered. The brief explanation is that text within a post is treated in the most basic of forms with regards to how a browser interprets it. This, in most cases, is fine. If you have rich text options, say adding bold or marking things as a spoiler, then you’re introducing HTML (and in our case, CSS) into the equation. Things can get rendered fairly consistently and hassle-free but, as our needs grow more complex, the more limitations we run into. As stated previously, we’re a fiction-heavy site. And formatting text in a consistent way is a good idea not just for readers but for writers; getting consistent spacing or divisions or fancy headers or whatever else isn’t essential but adds at lot to the experience.

Manipulating unfurnished text to do things like, say, changing line spacing, justification, gaps between paragraphs and whatever else can be done but it gets increasingly hacky and non-standard as we go along. And as we’re rapidly entering another dark age of standards with certain browsers (cough Google’s Chrome) making more opinionated decisions about how content should look or render by default, getting stricter with how our text is served up becomes more important.

To that end, all content posted as of some time ago is now wrapped in paragraph tags. Pains have been taken to preserve the look of the old way of text but this, in theory, opens up the door to eventual more granular options like indentation or setting user-defined space between text down the line. Eventually, old text will hopefully be parsed along these lines as well but I’m in no rush to do so since it’s a massive undertaking to take into account all the edge cases. … In fact, some of what I’m currently working on, and have not finished, is along these lines and has kept me busy for a few weeks—more on that in the future.

The thing about changing something so fundamental abut how text is handled and presented is that a lot of different systems are impacted. Think our own in-built BBCode (eg: [b] [spoiler] tags.) These are inline text in the form of <span> elements which could get away with long multiline enclosures and things which would work most times but, referring back to the strangeness of standards these days, is something that has no real guarantee to continue working in the same fashion. As <p> (paragraph) elements are now included, <span> should always been contained within them and you start getting even less consistent behavior when they’re not; things might look more-or-less the same without changes to the code but it’s still going to be invalid HTML that is subject to the caprices of specific browsers and their engines. A partial overhaul of the parts of code that deal with parsing these tags had to be made and I had plenty of fun using regular expressions to make things both as powerful and comprehensive and needed will keeping it as simple as possible. So unclosed tags would only limit themselves to a paragraph (or to the end of the input) and close automatically, and multiple applications of these spans should resolve in a more predictable manner. There’s a slight trade off in that these tags only apply for a single paragraph (you need to add it to the other if more needs to be italicized, for example) but it yields a stricter, more complete, output that can get less misinterpreted by browsers and is semantically immaculate.

There’s a knock-on effect with these changes not only needing the internal parser to be changed but also things like the editing functions in the user pages, user-side scripting and other things. The same goes with a lot of the other things I’ve mentioned. While I’ve been really sloppy with making regular posts here (sorry! I’ll try to be better) a lot of my free time on weekends has gone to implementing these things and then doing thorough debug and testing. It’ll still be a few weekends yet before I complete some of the other things in progress, like reworking old text and stories, or get around to finishing up the pleasant surprises I’ve had in the works but progress is steady.

I think I’ll try doing more status updates regularly, even if they’re a little light on content, to keep a better rhythm. I’ll also be doing posts for the higher tiers as well because that’s also been neglected. Whatever else, I hope that until next time, you take it easy!