Complete.Org: Mailing Lists: Archives: freeciv-dev: September 2001:
[Freeciv-Dev] Re: Split patch (was Re: [RFC PATCH] init_techs)
Home

[Freeciv-Dev] Re: Split patch (was Re: [RFC PATCH] init_techs)

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: Freeciv Developers <freeciv-dev@xxxxxxxxxxx>
Subject: [Freeciv-Dev] Re: Split patch (was Re: [RFC PATCH] init_techs)
From: Justin Moore <justin@xxxxxxxxxxx>
Date: Wed, 26 Sep 2001 19:44:25 -0400 (EDT)

> > > > > What about a strdup() in split?
> > > >
> > > >    I could, but I really think all memory allocation and de-allocation
> > > > should be done at the same level, so-to-speak.
> > >
> > > What is the problem? split() will strdup() the string it gets and
> > > frees it copy later.
> >
> > char *buf = "   foo bar ";
> > char *args[2];
> > int found = split("\S", buf, args, 2);
> >
> > If, within split, I do
> >    char *copybuf = strdup(buf);
> > and mess around with copybuf, I'm going to parse away the first few chars
> > of it.  args[0] != copybuf, since I've cut away the first few characters.
> > How will I know what to free?
>
> I was thinking of that split() will free the copybuf.

   I think I need some sample code to see how this works.  Even psuedocode
could go a long ways.  Maybe I'm just missing something simple here.
Given this function declaration:

int split(const char *toks, char *buf, char *args[], const int maxargs);

what happens where, allocation-wise?

> > > >    On a side note, I'm going to remove the "automatically removes
> > > > whitespace" implementation, and require the caller to pass a "\S" as a
> > > > token if they want to split on whitespace.  If they want to remove
> > > > whitespace surrounding other split tokens, they would pass in a "\s".  A
> > > > "\S" implies "\s".
> > > >
> > > > split("\s,", buf, args, 5);
> > > >
> > > > will correctly parse
> > > >
> > > > Alphabet,Iron Working,Pottery, The Wheel
> > > >
> > > > Comments?
> > >
> > > I would rather like to see this explicitly as an argument of split.
> >
> >    I think it's cleaner to keep fewer arguments.  I don't want this
> > turning into MFCs, where you have to pass 17 NULLs or 0s into a function
> > to get it right. ;p Just know some basic regex patters and it works.
>
> Then make a
>
> enum split_flags {
>  REMOVE_SPLIT_ON_WS=1,
>  REMOVE_REMOVE_WS=2
> };

   Ok, I know that's what you meant. ;p It just seems that your
suggestions would lead to this definition:

int split(const char *toks, char *buf, char *args[], const int maxargs,
          const int maxarg_length, const enum split_flags howto_split);

which just seems a bit, well ... overdone.  Plus you'd need a third
enum value: REMOVE_NO_WHITESPACE = 3.

   I haven't really seen any concrete arguments from you about *why* your
way is better.  My arguments are:

- It's no worse than strtok, MM-wise, and functionally more useful.
- The caller is responsible for strdup'ing and free'ing from within the
    same function, which is slightly more sane than random functions
    allocating memory within them and passing it back.
- Allowing a restricted subset of regex-like syntax leads to fewer
    function parameters, which is easier on the programmer.
- My way requires no memory allocation, provided the caller is willing to
    accept the buffer they pass in will be changed.

   It seems at least one person has agreed with me so far.  I think we
just need more people in on this, even if it's only "Raimar's right, and
Justin's being an idiot" or vice versa. :)

-jdm

Department of Computer Science, Duke University, Durham, NC 27708-0129
Email:  justin@xxxxxxxxxxx



[Prev in Thread] Current Thread [Next in Thread]