November 16, 2024, 02:32:45 AM

1,531,348 Posts in 46,734 Topics by 1,523 Members
› View the most recent posts on the forum.


hi jmv

Started by Kalahari Inkantation, June 10, 2017, 10:24:59 PM

previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Go Down

Kalahari Inkantation

Quote from: Khadafi on June 11, 2017, 12:05:06 PM
also i wrote a test script that parses out quotes but it gets kind of tricky with nested quotes


is there something that makes nested quotes harder for it to ignore than single quotes befuddlement

Daddy

Quote from: Majorana's Mask on June 11, 2017, 01:12:16 PM
Quote from: Khadafi on June 11, 2017, 12:05:06 PM
also i wrote a test script that parses out quotes but it gets kind of tricky with nested quotes


is there something that makes nested quotes harder for it to ignore than single quotes befuddlement


welll i was goign to say it only cought the first [/quote] but i fixed that

Code Select
$re = '`\[quote.*\].*\[\/quote\]`s';
also looks like i fixed the snipped quote issue

Daddy

Quote from: Majorana's Mask on June 11, 2017, 12:04:56 PM
[[[Lord Kazm]]]
Testing that now

cuck
Quote from: Majorana's Mask on June 11, 2017, 11:48:10 AM
you should write one and unleash it on boyah

have it make a nonsensical markov'd boyahpost a day
this is a test


Daddy

nope nvm.

it wouldn't be able to parse out anything but the last line of that previous post.

The sentence between the two quotes would get filtered out.

bluaki

June 11, 2017, 11:05:23 PM #19 Last Edit: June 11, 2017, 11:39:45 PM by bluaki
this is a textbook example of something you can't totally solve with regular expressions, counting is a fundamental limitation of regular languages

this is also why syntax highlighting systems that exclusively rely on regex are bad

you have to basically search for every instance of [quote] and [/quote] and count the number of open quote tags for each part of the post

pseudocode for this is something like:
Code Select
pos = 0
while match.search(post + pos, "[quote]"):
    append everything between pos and match.offset-1 to the output
    nesting = 1
    for each match of either "[quote]" or "[/quote]":
        if it's a "[quote]": nesting++
        if it's a "[/quote]": nesting--
        if nesting == 0: break
    pos = match.offset + match.length
append everything between pos and the end of the post to the output


if you want to match inside the brackets of a start-of-quote, you need to exclude the end bracket or else you'll continue matching up to the last bbcode tag before the end quote
Code Select
\[quote[^]]*\]

Daddy

Quote from: bluaki on June 11, 2017, 11:05:23 PM
this is a textbook example of something you can't totally solve with regular expressions, counting is a fundamental limitation of regular languages

this is also why syntax highlighting systems that exclusively rely on regex are bad

you have to basically search for every instance of [quote] and [/quote] and count the number of open quote tags for each part of the post

pseudocode for this is something like:
Code Select
pos = 0
while match.search(post + pos, "[quote]"):
    append everything between pos and match.offset-1 to the output
    nesting = 1
    for each match of either "[quote]" or "[/quote]":
        if it's a "[quote]": nesting++
        if it's a "[/quote]": nesting--
        if nesting == 0: break
    pos = match.offset + match.length
append everything between pos and the end of the post to the output


if you want to match inside the brackets of a start-of-quote, you need to exclude the end bracket or else you'll continue matching up to the last bbcode tag before the end quote
Code Select
\[quote[^]]*\]


but then you have things like
Code Select
[quote author=cuckman] that needs to accounted for, though tht's just an or statement rly

Daddy

excuse me but if im going to compete i need posts to reply to

bluaki

June 12, 2017, 09:19:01 AM #22 Last Edit: June 12, 2017, 09:23:39 AM by bluaki
Quote from: Khadafi on June 12, 2017, 07:40:42 AM
but then you have things like
Code Select
[quote author=cuckman] that needs to accounted for, though tht's just an or statement rly
In that pseudocode I used "[quote]" for clarity, but if you replace every instance of that with the regular expression at the end of my post, it'll handle optional attributes like author and date. I just replaced the ".*" in your expression with "[^]]*" to fix the problem of your pattern matching way too far past the first right-bracket.

Kalahari Inkantation

Quote from: Khadafi on June 12, 2017, 09:02:11 AM
excuse me but if im going to compete i need posts to reply to


reply to blu because i think she's onto something akudood;

boyah, with its rich ten-year history of wacky posts, deserves its very own markov chain

Daddy


Kalahari Inkantation

bump

is a boyah markov chain even really within the realm of feasibility lol

Kalahari Inkantation

what if we were to automatically mirror random boyah posts without formatting to some secure database (like, probably even a basic google drive account accessible only to the bot and its keeper would suffice), and had our markov bot draw from that instead of directly from boyah

that way it wouldn't have to parse quotes at all

Go Up