|
|||
|
Hi,
we just set up a SQL Server 2008 on Windows Server 2008 (configured as file / webserver, 16 GB memory, 4 dual core processors, 6 internal disks + 15 disc ISCSI array with overall almost 3 TB space) to use filestream and ran into the following problem: We extracted binaries from our 'original' database and imported it into the new 08 database using filestream - so far, so good. In the beginning everything looked pretty good, we extracted / imported over 100.000 rows per hour but over time the performance degraded significantly (down to a roughly 4500 per hour). At this point, we have about 1 million files in our filesteam filegroup (which means in one directory on our binary array) but there are 500.000 more to go - and at this speed this will take days (not even thinking about what it might mean for performance afterwards). Searching for reasons, we realized that the inserts into the new table took most of the time (over 80%), so we first stopped and rebuilt the only index on that one (primary key) and, after realizing that this did not help at all, we dropped it completely (since uniqueness is assured through the original table anyways) - same result, no change, still way too slow. Did anybody run into a similar problem so far or is it in the end not even a SQL server but a Windows / File System problem (since through filestreaming millions of files go into a single folder) ?! Any ideas or tips would be appreciated ... thanks in advance ! Kind regards Andy |
|
|
||||
|
||||
|
|
|
|||
|
Andreas Zimmermann (souldiver@nurfuerspam.de) writes:
> we just set up a SQL Server 2008 on Windows Server 2008 (configured as > file / webserver, 16 GB memory, 4 dual core processors, 6 internal disks > + 15 disc ISCSI array with overall almost 3 TB space) to use filestream > and ran into the following problem: > > We extracted binaries from our 'original' database and imported it into > the new 08 database using filestream - so far, so good. > > In the beginning everything looked pretty good, we extracted / imported > over 100.000 rows per hour but over time the performance degraded > significantly (down to a roughly 4500 per hour). At this point, we have > about 1 million files in our filesteam filegroup (which means in one > directory on our binary array) but there are 500.000 more to go - and at > this speed this will take days (not even thinking about what it might > mean for performance afterwards). > > Searching for reasons, we realized that the inserts into the new table > took most of the time (over 80%), so we first stopped and rebuilt the > only index on that one (primary key) and, after realizing that this did > not help at all, we dropped it completely (since uniqueness is assured > through the original table anyways) - same result, no change, still way > too slow. I don't have any answer for you, so I've forwarded your post to my MVP colleagues, to see if they have anything to add. But I have one question: how do you write the data? Through INSERT, or do you use the OpenSqlFilestream to get a file handle for the files? -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx |
|
|||
|
> Did anybody run into a similar problem so far or is it in the end not even
> a > SQL server but a Windows / File System problem (since through > filestreaming > millions of files go into a single folder) ?! How many paths in your FILESTREAM filegroup? If you have only a single path, I'm curious if adding additional paths will improve performance. -- Hope this helps. Dan Guzman SQL Server MVP http://weblogs.sqlteam.com/dang/ "Andreas Zimmermann" <souldiver@nurfuerspam.de> wrote in message news:g29p93$o8u$2@aioe.org... > Hi, > > we just set up a SQL Server 2008 on Windows Server 2008 (configured as > file > / webserver, 16 GB memory, 4 dual core processors, 6 internal disks + 15 > disc ISCSI array with overall almost 3 TB space) to use filestream and ran > into the following problem: > > We extracted binaries from our 'original' database and imported it into > the > new 08 database using filestream - so far, so good. > > In the beginning everything looked pretty good, we extracted / imported > over > 100.000 rows per hour but over time the performance degraded significantly > (down to a roughly 4500 per hour). At this point, we have about 1 million > files in our filesteam filegroup (which means in one directory on our > binary > array) but there are 500.000 more to go - and at this speed this will take > days (not even thinking about what it might mean for performance > afterwards). > > Searching for reasons, we realized that the inserts into the new table > took > most of the time (over 80%), so we first stopped and rebuilt the only > index > on that one (primary key) and, after realizing that this did not help at > all, we dropped it completely (since uniqueness is assured through the > original table anyways) - same result, no change, still way too slow. > > Did anybody run into a similar problem so far or is it in the end not even > a > SQL server but a Windows / File System problem (since through > filestreaming > millions of files go into a single folder) ?! > > Any ideas or tips would be appreciated ... thanks in advance ! > > Kind regards > > Andy > > > |
|
|||
|
Well, that was an idea we also had ... by the 'standard' setup there is
only one 'file' (in this case path) in the filegroup - so we tried to add one and had to realize that this is not possible (so far ?!) ... you get a simple error message that filestream filegroups are not allowed to contain more than one path ... Dan Guzman wrote: >> Did anybody run into a similar problem so far or is it in the end not >> even a >> SQL server but a Windows / File System problem (since through >> filestreaming >> millions of files go into a single folder) ?! > > How many paths in your FILESTREAM filegroup? If you have only a single > path, I'm curious if adding additional paths will improve performance. > |
|
|||
|
At the moment we're using a 'straight forward' insert after extracting
the data from our original table(s) ... is there an advantage in using OpenSqlFilestream instead (if so we would certainly 'restrucure' the extract / import) ?! Erland Sommarskog wrote: > Andreas Zimmermann (souldiver@nurfuerspam.de) writes: >> we just set up a SQL Server 2008 on Windows Server 2008 (configured as >> file / webserver, 16 GB memory, 4 dual core processors, 6 internal disks >> + 15 disc ISCSI array with overall almost 3 TB space) to use filestream >> and ran into the following problem: >> >> We extracted binaries from our 'original' database and imported it into >> the new 08 database using filestream - so far, so good. >> >> In the beginning everything looked pretty good, we extracted / imported >> over 100.000 rows per hour but over time the performance degraded >> significantly (down to a roughly 4500 per hour). At this point, we have >> about 1 million files in our filesteam filegroup (which means in one >> directory on our binary array) but there are 500.000 more to go - and at >> this speed this will take days (not even thinking about what it might >> mean for performance afterwards). >> >> Searching for reasons, we realized that the inserts into the new table >> took most of the time (over 80%), so we first stopped and rebuilt the >> only index on that one (primary key) and, after realizing that this did >> not help at all, we dropped it completely (since uniqueness is assured >> through the original table anyways) - same result, no change, still way >> too slow. > > I don't have any answer for you, so I've forwarded your post to my MVP > colleagues, to see if they have anything to add. But I have one question: > how do you write the data? Through INSERT, or do you use the > OpenSqlFilestream to get a file handle for the files? > > |
|
|||
|
Andreas Zimmermann (souldiver@nurfuerspam.de) writes:
> At the moment we're using a 'straight forward' insert after extracting > the data from our original table(s) ... is there an advantage in using > OpenSqlFilestream instead (if so we would certainly 'restrucure' the > extract / import) ?! As I understand it, OpenSqlFilstream is more or less the whole point with FILESTREAM: you can write data through the Win32 file API, which should be faster than going through the needles eye of SQL Server. However, it's not likely to help you with the issue of number of files in the same directory getting larger than NTFS can cope with. I will have to say that this issue exists at all. It should be a no-brainer to spot this problem in the design. There seems to be a workaround, though. I don't know if Jimi2Cool is your alter ego, if he just a person with the same problem as you. In any case, he was able to diagnose the root of the problem and also able to address it. Whether his fix is supported remains to see: http://forums.microsoft.com/MSDN/Sho...iteID=1&mode=1 -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx |
|
|||
|
> Well, that was an idea we also had ... by the 'standard' setup there is
> only one 'file' (in this case path) in the filegroup - so we tried to add > one and had to realize that this is not possible (so far ?!) ... you get a > simple error message that filestream filegroups are not allowed to contain > more than one path ... My bad - I didn't consider separate filegroups are required for separate FILESTREAM folders. Another workaround might to create separate FILESTREAM filegroups and parition the FILESTREAM portion of the table so that data is distributed evenly in those folders. This is a bit of a kluge, though. I think the workaround Erland found is a less kludgey kludge ;-) -- Hope this helps. Dan Guzman SQL Server MVP http://weblogs.sqlteam.com/dang/ "Andreas Zimmermann" <souldiver@nurfuerspam.de> wrote in message news:g2bic1$2sj$1@aioe.org... > Well, that was an idea we also had ... by the 'standard' setup there is > only one 'file' (in this case path) in the filegroup - so we tried to add > one and had to realize that this is not possible (so far ?!) ... you get a > simple error message that filestream filegroups are not allowed to contain > more than one path ... > > Dan Guzman wrote: >>> Did anybody run into a similar problem so far or is it in the end not >>> even a >>> SQL server but a Windows / File System problem (since through >>> filestreaming >>> millions of files go into a single folder) ?! >> >> How many paths in your FILESTREAM filegroup? If you have only a single >> path, I'm curious if adding additional paths will improve performance. >> |
|
|||
|
In fact, he is a collegue of mine ... and we found about that particular
workaround on friday (and it solved the problem for now it seems) - I just did not have the time to post here ... but thanks for your heads up ! Erland Sommarskog wrote: > Andreas Zimmermann (souldiver@nurfuerspam.de) writes: >> At the moment we're using a 'straight forward' insert after extracting >> the data from our original table(s) ... is there an advantage in using >> OpenSqlFilestream instead (if so we would certainly 'restrucure' the >> extract / import) ?! > > As I understand it, OpenSqlFilstream is more or less the whole point > with FILESTREAM: you can write data through the Win32 file API, which > should be faster than going through the needles eye of SQL Server. > > However, it's not likely to help you with the issue of number of files > in the same directory getting larger than NTFS can cope with. I will have > to say that this issue exists at all. It should be a no-brainer to spot > this problem in the design. > > There seems to be a workaround, though. I don't know if Jimi2Cool is your alter ego, if he just a person with the same problem as you. In any case, > he was able to diagnose the root of the problem and also able to address > it. Whether his fix is supported remains to see: > http://forums.microsoft.com/MSDN/Sho...iteID=1&mode=1 > |
|
|||
|
Thanks ... we might have a look at this - although you're right, it
sounds like a kludge ![]() Dan Guzman wrote: >> Well, that was an idea we also had ... by the 'standard' setup there >> is only one 'file' (in this case path) in the filegroup - so we tried >> to add one and had to realize that this is not possible (so far ?!) >> ... you get a simple error message that filestream filegroups are not >> allowed to contain more than one path ... > > My bad - I didn't consider separate filegroups are required for separate > FILESTREAM folders. Another workaround might to create separate > FILESTREAM filegroups and parition the FILESTREAM portion of the table > so that data is distributed evenly in those folders. This is a bit of a > kluge, though. I think the workaround Erland found is a less kludgey > kludge ;-) > |
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Re: Reading MS SQL Server files | Alan Churchill | Newsgroup comp.soft-sys.sas | 0 | 07-16-2008 08:08 PM |
| Re: Reading MS SQL Server files | Sigurd Hermansen | Newsgroup comp.soft-sys.sas | 1 | 05-29-2008 12:30 PM |
| Re: SAS Data Set Name Restrictions when it is really a SQL Server | data _null_, | Newsgroup comp.soft-sys.sas | 0 | 10-08-2007 06:01 PM |
| Re: not seeing all tables in SQL server | Darryl Putnam | Newsgroup comp.soft-sys.sas | 0 | 11-28-2005 07:31 PM |