Go Back   Rhinocerus > Newsgroup > Newsgroup comp.lang.python

Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old 08-14-2012, 03:38 PM
light1quark@gmail.com
Guest
 
Posts: n/a
Default Strange behavior

Hi, I am migrating from PHP to Python and I am slightly confused.

I am making a function that takes a startingList, finds all the strings in the list that begin with 'x', removes those strings and puts them into a xOnlyList.

However if you run the code you will notice only one of the strings beginning with 'x' is removed from the startingList.
If I comment out 'startingList.remove(str);' the code runs with both strings beginning with 'x' being put in the xOnlyList.
Using the print statement I noticed that the second string that begins with 'x' isn't even identified by the function. Why does this happen?

def testFunc(startingList):
xOnlyList = [];
for str in startingList:
if (str[0] == 'x'):
print str;
xOnlyList.append(str)
startingList.remove(str) #this seems to be the problem
print xOnlyList;
print startingList
testFunc(['xasd', 'xjkl', 'sefwr', 'dfsews'])

#Thanks for your help!
Reply With Quote
Alt Today
Advertising
 
and become member of Rhinocerus
Standard Sponsored Links

  #2 (permalink)  
Old 08-14-2012, 03:59 PM
Alain Ketterlin
Guest
 
Posts: n/a
Default Re: Strange behavior

light1quark@gmail.com writes:

> However if you run the code you will notice only one of the strings
> beginning with 'x' is removed from the startingList.


>
> def testFunc(startingList):
> xOnlyList = [];
> for str in startingList:
> if (str[0] == 'x'):
> print str;
> xOnlyList.append(str)
> startingList.remove(str) #this seems to be the problem
> print xOnlyList;
> print startingList
> testFunc(['xasd', 'xjkl', 'sefwr', 'dfsews'])
>
> #Thanks for your help!


Try with ['xasd', 'sefwr', 'xjkl', 'dfsews'] and you'll understand what
happens. Also, have a look at:

http://docs.python.org/reference/com...-for-statement

You can't modify the list you're iterating on, better use another list
to collect the result.

-- Alain.

P/S: str is a builtin, you'd better avoid assigning to it.
Reply With Quote
  #3 (permalink)  
Old 08-14-2012, 07:05 PM
Terry Reedy
Guest
 
Posts: n/a
Default Re: Strange behavior

On 8/14/2012 11:59 AM, Alain Ketterlin wrote:
> light1quark@gmail.com writes:
>
>> However if you run the code you will notice only one of the strings
>> beginning with 'x' is removed from the startingList.

>
>>
>> def testFunc(startingList):
>> xOnlyList = [];
>> for str in startingList:
>> if (str[0] == 'x'):
>> print str;
>> xOnlyList.append(str)
>> startingList.remove(str) #this seems to be the problem
>> print xOnlyList;
>> print startingList
>> testFunc(['xasd', 'xjkl', 'sefwr', 'dfsews'])
>>
>> #Thanks for your help!

>
> Try with ['xasd', 'sefwr', 'xjkl', 'dfsews'] and you'll understand what
> happens. Also, have a look at:
>
> http://docs.python.org/reference/com...-for-statement
>
> You can't modify the list you're iterating on,


Except he obviously did ;-).
(Modifying set or dict raises SomeError.)

Indeed, people routine *replace* items while iterating.

def squarelist(lis):
for i, n in enumerate(lis):
lis[i] = n*n
return lis

print(squarelist([0,1,2,3,4,5]))
# [0, 1, 4, 9, 16, 25]

Removals can be handled by iterating in reverse. This works even with
duplicates because if the item removed is not the one tested, the one
tested gets retested.

def removeodd(lis):
for n in reversed(lis):
if n % 2:
lis.remove(n)
print(n, lis)

ll = [0,1, 5, 5, 4, 5]
removeodd(ll)
>>>

5 [0, 1, 5, 4, 5]
5 [0, 1, 4, 5]
5 [0, 1, 4]
4 [0, 1, 4]
1 [0, 4]
0 [0, 4]

> better use another list to collect the result.


If there are very many removals, a new list will be faster, even if one
needs to copy the new list back into the original, as k removals from
len n list is O(k*n) versus O(n) for new list and copy.

> P/S: str is a builtin, you'd better avoid assigning to it.


Agreed. People have actually posted code doing something like

....
list = [1,2,3]
....
z = list(x)
....
and wondered and asked why it does not work.

--
Terry Jan Reedy

Reply With Quote
  #4 (permalink)  
Old 08-14-2012, 07:20 PM
light1quark@gmail.com
Guest
 
Posts: n/a
Default Re: Strange behavior

I got my answer by reading your posts and referring to: http://docs.python.org/reference/com...-for-statement
(particularly the shaded grey box)

I guess I should have (obviously) looked at the doc's before posting here; but im a noob.

Thanks for your help.
Reply With Quote
  #5 (permalink)  
Old 08-14-2012, 07:40 PM
Virgil Stokes
Guest
 
Posts: n/a
Default Re: Strange behavior

On 2012-08-14 17:38, light1quark@gmail.com wrote:
> Hi, I am migrating from PHP to Python and I am slightly confused.
>
> I am making a function that takes a startingList, finds all the strings in the list that begin with 'x', removes those strings and puts them into a xOnlyList.
>
> However if you run the code you will notice only one of the strings beginning with 'x' is removed from the startingList.
> If I comment out 'startingList.remove(str);' the code runs with both strings beginning with 'x' being put in the xOnlyList.
> Using the print statement I noticed that the second string that begins with 'x' isn't even identified by the function. Why does this happen?
>
> def testFunc(startingList):
> xOnlyList = [];
> for str in startingList:
> if (str[0] == 'x'):
> print str;
> xOnlyList.append(str)
> startingList.remove(str) #this seems to be the problem
> print xOnlyList;
> print startingList
> testFunc(['xasd', 'xjkl', 'sefwr', 'dfsews'])
>
> #Thanks for your help!


You might find the following useful:

def testFunc(startingList):
xOnlyList = []; j = -1
for xl in startingList:
if (xl[0] == 'x'):
xOnlyList.append(xl)
else:
j += 1
startingList[j] = xl
if j == -1:
startingList = []
else:
del startingList[j:-1]

return(xOnlyList)


testList1 = ['xasd', 'xjkl', 'sefwr', 'dfsews']
testList2 = ['xasd', 'xjkl', 'xsefwr', 'xdfsews']
testList3 = ['xasd', 'jkl', 'sefwr', 'dfsews']
testList4 = ['asd', 'jkl', 'sefwr', 'dfsews']

xOnlyList = testFunc(testList1)
print 'xOnlyList = ',xOnlyList
print 'testList = ',testList1
xOnlyList = testFunc(testList2)
print 'xOnlyList = ',xOnlyList
print 'testList = ',testList2
xOnlyList = testFunc(testList3)
print 'xOnlyList = ',xOnlyList
print 'testList = ',testList3
xOnlyList = testFunc(testList4)
print 'xOnlyList = ',xOnlyList
print 'testList = ',testList4

And here is another version using list comprehension that I prefer

testList1 = ['xasd', 'xjkl', 'sefwr', 'dfsews']
testList2 = ['xasd', 'xjkl', 'xsefwr', 'xdfsews']
testList3 = ['xasd', 'jkl', 'sefwr', 'dfsews']
testList4 = ['asd', 'jkl', 'sefwr', 'dfsews']

def testFunc2(startingList):
return([x for x in startingList if x[0] == 'x'], [x for x in
startingList if x[0] != 'x'])

xOnlyList,testList = testFunc2(testList1)
print xOnlyList
print testList
xOnlyList,testList = testFunc2(testList2)
print xOnlyList
print testList
xOnlyList,testList = testFunc2(testList3)
print xOnlyList
print testList
xOnlyList,testList = testFunc2(testList4)
print xOnlyList
print testList

Reply With Quote
  #6 (permalink)  
Old 08-14-2012, 09:55 PM
Chris Angelico
Guest
 
Posts: n/a
Default Re: Strange behavior

On Wed, Aug 15, 2012 at 1:38 AM, <light1quark@gmail.com> wrote:
> def testFunc(startingList):
> xOnlyList = [];
> for str in startingList:
> if (str[0] == 'x'):
> print str;
> xOnlyList.append(str)
> startingList.remove(str) #this seems to be the problem
> print xOnlyList;
> print startingList
> testFunc(['xasd', 'xjkl', 'sefwr', 'dfsews'])


Other people have explained the problem with your code. I'll take this
example as a way of introducing you to one of Python's handy features
- it's an idea borrowed from functional languages, and is extremely
handy. It's called the "list comprehension", and can be looked up in
the docs under that name,

def testFunc(startingList):
xOnlyList = [strng for strng in startingList if strng[0] == 'x']
startingList = [strng for strng in startingList if strng[0] != 'x']
print(xOnlyList)
print(startingList)

It's a compact notation for building a list from another list. (Note
that I changed "str" to "strng" to avoid shadowing the built-in name
"str", as others suggested.)

(Unrelated side point: Putting parentheses around the print statements
makes them compatible with Python 3, in which 'print' is a function.
Unless something's binding you to Python 2, consider working with the
current version - Python 2 won't get any more features added to it any
more.)

Python's an awesome language. You may have to get your head around a
few new concepts as you shift thinking from PHP's, but it's well worth
while.

Chris Angelico
Reply With Quote
  #7 (permalink)  
Old 08-15-2012, 12:19 AM
Steven D'Aprano
Guest
 
Posts: n/a
Default Re: Strange behavior

On Tue, 14 Aug 2012 21:40:10 +0200, Virgil Stokes wrote:

> You might find the following useful:
>
> def testFunc(startingList):
> xOnlyList = []; j = -1
> for xl in startingList:
> if (xl[0] == 'x'):


That's going to fail in the starting list contains an empty string. Use
xl.startswith('x') instead.


> xOnlyList.append(xl)
> else:
> j += 1
> startingList[j] = xl


Very cunning, but I have to say that your algorithm fails the "is this
obviously correct without needing to study it?" test. Sometimes that is
unavoidable, but for something like this, there are simpler ways to solve
the same problem.


> if j == -1:
> startingList = []
> else:
> del startingList[j:-1]
> return(xOnlyList)



> And here is another version using list comprehension that I prefer


> def testFunc2(startingList):
> return([x for x in startingList if x[0] == 'x'], [x for x in
> startingList if x[0] != 'x'])


This walks over the starting list twice, doing essentially the same thing
both times. It also fails to meet the stated requirement that
startingList is modified in place, by returning a new list instead.
Here's an example of what I mean:

py> mylist = mylist2 = ['a', 'x', 'b', 'xx', 'cx'] # two names for one
list
py> result, mylist = testFunc2(mylist)
py> mylist
['a', 'b', 'cx']
py> mylist2 # should be same as mylist
['a', 'x', 'b', 'xx', 'cx']

Here is the obvious algorithm for extracting and removing words starting
with 'x'. It walks the starting list only once, and modifies it in place.
The only trick needed is list slice assignment at the end.

def extract_x_words(words):
words_with_x = []
words_without_x = []
for word in words:
if word.startswith('x'):
words_with_x.append(word)
else:
words_without_x.append(word)
words[:] = words_without_x # slice assignment
return words_with_x


The only downside of this is that if the list of words is so enormous
that you can fit it in memory *once* but not *twice*, this may fail. But
the same applies to the list comprehension solution.



--
Steven
Reply With Quote
  #8 (permalink)  
Old 08-15-2012, 09:50 AM
Alain Ketterlin
Guest
 
Posts: n/a
Default Re: Strange behavior

Chris Angelico <rosuav@gmail.com> writes:

> Other people have explained the problem with your code. I'll take this
> example as a way of introducing you to one of Python's handy features
> - it's an idea borrowed from functional languages, and is extremely
> handy. It's called the "list comprehension", and can be looked up in
> the docs under that name,
>
> def testFunc(startingList):
> xOnlyList = [strng for strng in startingList if strng[0] == 'x']
> startingList = [strng for strng in startingList if strng[0] != 'x']
> print(xOnlyList)
> print(startingList)
>
> It's a compact notation for building a list from another list. (Note
> that I changed "str" to "strng" to avoid shadowing the built-in name
> "str", as others suggested.)


Fully agree with you: list comprehension is, imo, the most useful
program construct ever. Extremely useful.

But not when it makes the program traverse twice the same list, where
one traversal is enough.

-- Alain.
Reply With Quote
  #9 (permalink)  
Old 08-15-2012, 09:57 AM
Alain Ketterlin
Guest
 
Posts: n/a
Default Re: Strange behavior

light1quark@gmail.com writes:

> I got my answer by reading your posts and referring to:
> http://docs.python.org/reference/com...-for-statement
> (particularly the shaded grey box)


Not that the problem is not specific to python (if you erase the current
element when traversing a STL list in C++ you'll get a crash as well).

> I guess I should have (obviously) looked at the doc's before posting
> here; but im a noob.


Python has several surprising features. I think it is a good idea to
take some time to read the language reference, from cover to cover
(before or after the various tutorials, depending on your background).

-- Alain.
Reply With Quote
  #10 (permalink)  
Old 08-16-2012, 11:18 AM
Virgil Stokes
Guest
 
Posts: n/a
Default Re: Strange behavior

On 15-Aug-2012 02:19, Steven D'Aprano wrote:
> On Tue, 14 Aug 2012 21:40:10 +0200, Virgil Stokes wrote:
>
>> You might find the following useful:
>>
>> def testFunc(startingList):
>> xOnlyList = []; j = -1
>> for xl in startingList:
>> if (xl[0] == 'x'):

> That's going to fail in the starting list contains an empty string. Use
> xl.startswith('x') instead.

Yes, but this was by design (tacitly assumed that startingList was both a list
and non-empty).
>
>
>> xOnlyList.append(xl)
>> else:
>> j += 1
>> startingList[j] = xl

> Very cunning, but I have to say that your algorithm fails the "is this
> obviously correct without needing to study it?" test. Sometimes that is
> unavoidable, but for something like this, there are simpler ways to solve
> the same problem.

Sorry, but I do not sure what you mean here.
>
>
>> if j == -1:
>> startingList = []
>> else:
>> del startingList[j:-1]
>> return(xOnlyList)

>
>> And here is another version using list comprehension that I prefer
>> def testFunc2(startingList):
>> return([x for x in startingList if x[0] == 'x'], [x for x in
>> startingList if x[0] != 'x'])

> This walks over the starting list twice, doing essentially the same thing
> both times. It also fails to meet the stated requirement that
> startingList is modified in place, by returning a new list instead.

This can meet the requirement that startingList is modified in place via the
call to this function (see the attached code).
> Here's an example of what I mean:
>
> py> mylist = mylist2 = ['a', 'x', 'b', 'xx', 'cx'] # two names for one
> list
> py> result, mylist = testFunc2(mylist)
> py> mylist
> ['a', 'b', 'cx']
> py> mylist2 # should be same as mylist
> ['a', 'x', 'b', 'xx', 'cx']

Yes, I had a typo in my original posting --- sorry about that!
>
> Here is the obvious algorithm for extracting and removing words starting
> with 'x'. It walks the starting list only once, and modifies it in place.
> The only trick needed is list slice assignment at the end.
>
> def extract_x_words(words):
> words_with_x = []
> words_without_x = []
> for word in words:
> if word.startswith('x'):
> words_with_x.append(word)
> else:
> words_without_x.append(word)
> words[:] = words_without_x # slice assignment
> return words_with_x

Suppose words was not a list --- you have tacitly assumed that words is a list.
>
> The only downside of this is that if the list of words is so enormous
> that you can fit it in memory *once* but not *twice*, this may fail. But
> the same applies to the list comprehension solution.

But, this is not the only downside if speed is important --- it is slower than
the list comprehension method (see results that follows).

Here is a summary of three algorithms (algorithm-1, algorithm-2, algorithm-2A)
that I tested (see attached code). Note, algorithm-2A was obtained by removing
the slice assignment in the above code and modifying the return as follows

def extract_x_words(words):
words_with_x = []
words_without_x = []
for word in words:
if word.startswith('x'):
words_with_x.append(word)
else:
words_without_x.append(word)
#words[:] = words_without_x # slice assignment
return words_with_x, words_without_x

Of course, one needs to modify the call for "in-place" update of startingList as
follows:

xOnlyList,startingList = extract_x_words(startingList)

Here is a summary of my timing results obtained for 3 different algorithms for
lists with 100,000 strings of length 4 in each list:

Method
average (sd) time in seconds
algorithm-1 (list comprehension)
0.11630 (0.0014)
algorithm-2 (S. D'Aprano)
0.17594 (0.0014)
algorithm-2A (modified S. D'Aprano)
0.18217 (0.0023)


These values were obtained from 100 independent runs (MC simulations) on lists
that contain 100,000 strings. Approximately 50% of these strings contained a
leading 'x'. Note, that the results show that algorithm-2 (suggested by S.
D'Aprano) is approximately 51% slower than algorithm-1 (list comprehensions) and
algorithm-2A (simple modification of algorithm-2) is approximately 57% slower
than algorithm-1. Why is algorithm-2A slower than algorithm-2?

I would be interested in seeing code that is faster than algorithm-1 --- any
suggestions are welcomed. And of course, if there are any errors in my attached
code please inform me of them and I will try to correct them as soon as
possible. Note, some of the code is actually irrelevant for the original
"Strange behavior" post.

Have a good day!


Reply With Quote
  #11 (permalink)  
Old 08-16-2012, 01:02 PM
Peter Otten
Guest
 
Posts: n/a
Default Re: Strange behavior

Virgil Stokes wrote:

>>> def testFunc(startingList):
>>>xOnlyList = []; j = -1
>>>for xl in startingList:
>>>if (xl[0] == 'x'):

>> That's going to fail in the starting list contains an empty string. Use
>> xl.startswith('x') instead.

> Yes, but this was by design (tacitly assumed that startingList was both a
> list and non-empty).


You missunderstood it will fail if the list contains an empty string, not if
the list itself is empty:

>>> words = ["alpha", "", "xgamma"]
>>> [word for word in words if word[0] == "x"]

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: string index out of range

The startswith() version:

>>> [word for word in words if word.startswith("x")]

['xgamma']

Also possible:

>>> [word for word in words if word[:1] == "x"]

['xgamma']

> def testFunc1(startingList):
> '''
> Algorithm-1
> Note:
> One should check for an empty startingList before
> calling testFunc1 -- If this possibility exists!
> '''
> return([x for x in startingList if x[0] == 'x'],
> [x for x in startingList if x[0] != 'x'])
>
>
> I would be interested in seeing code that is faster than algorithm-1


In pure Python? Perhaps the messy variant:

def test_func(words):
nox = []
append = nox.append
withx = [x for x in words if x[0] == 'x' or append(x)]
return withx, nox


Reply With Quote
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off




All times are GMT. The time now is 01:52 PM.


Copyright ©2009

LinkBacks Enabled by vBSEO 3.3.0 RC2 © 2009, Crawlability, Inc.