Wednesday, October 5, 2011

PowerPivot and SQL Azure

When PowerPivot was released, I was sceptical. Yes, it could be a great tool for in-house reporting, but not so with a SaaS application. My experience was that SaaS reporting should be embedded in the application. Maintaining a separate reporting systen such as SSRS, or even Telerik Reporting, in addition to the main application would be too much work. And it turned out that a ASP.NET ROLAP control like DevExpress ASPxPivotGrid would be good enough for both standard reports and custom reports, eliminating the need for another reporting system.

This reasoning applies if there are extra costs involved in supporting a new system. PowerPivot, however, is purportedly a self-service system. If it doesn't impose new costs, why not support it ? Unfortunately, even if there are no report authoring costs, there are still system-level problems that have to be solved before PowerPivot can be used in the public internet. First, you need to configure networking to allow secure access to your data. A direct SQL connection to your database is not always a good idea, and data services can take too much time to develop and set up. Once there is access to data, the next problem is to make sure that your database system scales up to the new demand. Alternatively, you may want to implement resource management so that PowerPivot doesn't hog up too much resources. These are hard problems for a small SaaS provider, but interestingly, they are also problems that Azure promises to solve.

With my simple application running in Azure, I was all set to experiment with PowerPivot. The easiest way to use PowerPivot with Azure is to connect directly to a SQL Azure database. This is done in two steps:

  1. In Azure management portal, add your current IP address to the firewall rules for the SQL Azure server. 
  2. In PowerPivot, connect to the database. There's a little catch: Ordinary database connection didn't work, PowerPivot just crashed Excel. You need to use "From Other Sources" - "Microsoft SQL Azure"



And there you have it, flexible PowerPivot access to your data. I'm impressed how well it complements web-based reporting. I've used it to answer questions about the data for which the PivotGrid does not adapt easily, such as percent-of-total calculations. The main restriction with PowerPivot is that it suits best for power users with full access to data. Setting up views for row-level security is too much hassle in most cases.

-mika-

Sunday, September 4, 2011

Moving to Azure


Today, I moved my small application to Azure. Rarely have I been this excited. Azure reminds me of the days when I was fifteen and Netscape Navigator was just released. I spend nights in my friend's place browsing the web with a 14kbps modem and a 486. The internet was nothing it is today, but it was exciting.

In my professional career, I have had to run dozens of times to a server facility, installing new servers, putting out fires on existing ones and so on. I have used multiple different customer VPN connections to different server environments, each with its own peculiarities. The hardware costs are reasonable, but the management burden is getting out of control as the business grows. The worst thing is that the quality of system administration is not what it should be, when developers like me work as accidental server, network and database administrators.

Azure promises to fix a lot of this. It is not perfect, and my skills are not yet fit to this new environment. Azure costs can run high compared to a cheap server. I'm more afraid of mistakes, downtime and data loss, but have less control to fix things. There is no simple SQL Azure backup system. I'm not convinced that I could recover a broken webapp fast. At least there is Remote Desktop now.

But once I got my simple app out of a shared physical server to its own corner of the Azure cloud, I knew there is no way back.

-mika-

Friday, March 18, 2011

Flowmarks Events

Ten years ago I used Excel heavily. The usage pattern was common. What started as a simple spreadsheet quickly evolveld into an information system. Mostly a good information system too: Excel was extremely fast and flexible. The problem with Excel was that a file-based information system worked well only on a local machine with a single user. Once you needed remote access to your data with write privileges, things started to fall apart.

Today, everything worthy of an information system is in the web. Money, time, sports, health.. you name it. Besides remote access and concurrency, webapps offer benefits of specialization that Excel can never do. Money and time management webapps support automated data entry. Sports trackers encourage sharing in social networks. Nutrition and health apps have expert system features like recommendation engines. In specialized applications, out-of-the box reports are usually relevant too. In Excel, you have to define calculations and charts yourself. And then there is the whole smartphone support thing.

But even with all theses improvements, there was something I was missing from the Excel days. Call it versatility and freedom. Compared to Excel, webapps have significant drawbacks:
  • setting up the account, credential management for each app
  • learning a different UI for each new app, tolerating with what you get
  • not having complete control over your data, data loss in the worst case
  • limited reporting, you can't define the calculations and reports you want
  • bad performance, downtime
  • subscription costs, or a free service with no rights to your own data
These drawbacks are acceptable if the payoff is there, but especially with smaller datasets, the benefits may not be worth the costs. Personally, I wanted to track small, ad hoc time series, with pivot table reporting. I didn't really want to use another isolated webapp for this, so I rolled up my own to handle them all.

It is called Flowmarks Events.

You can download it from CodePlex, or just use it at http://events.flowmarks.com.
There is also little documentation and a read-only demo (with book reading data) that has no login requirement.

The events app is a DotNetNuke module. Unfortunately, for licensing reasons, I can't distribute the reporting module that is available in the website.

-mika-

Saturday, February 5, 2011

Diving Into HTML5

Early this year I was working on a user interface refresh and refactoring project with an ASP.NET Web Forms application. In the process I read a great book by Mark Pilgrim, Dive Into HTML5. It presents a nice overview of the new features in HTML5. The most important thing I learned from the book was that HTML5 is something that deserves thought today, not in some distant future.

Data entry forms make a good example. HTML5 provides multiple new web form input types and attributes. In essence, HTML5 has many of the input types and attributes built-in that you would end up adding anyway, using javascript libraries like jQuery & plugins or server-side proprietary controls such as Telerik or DevExpress (for ASP.NET). These include the placeholder and the autofocus attributes for textboxes, specialized input types for common data types (date pickers, email address fieldsnumber selectors like spinboxes and sliders), and also client-side validation support for required fields and email addresses. Many of these are already supported in the latest browsers and should degrade gracefully in older browsers, including IE6. 

Yet, there are two problems. First, in order to provide a consistent user experience, you'll need scripted fallbacks for older browsers. Detecting HTML5 support and implementing fallbacks adds extra work. Personally, I would be surprised if more than a tiny fraction of websites implemented fallbacks comprehensively. And this is something that affects UI development for a long time. I would predict that fallbacks could be dropped at earliest in about 2015 when new machines are shipped with Windows 8 and the great majority of browsers supports enough of the HTML5 features.

Second, what if you are using proprietary server-side controls? The situation is unclear, and it is up to the vendors how and when HTML5 is supported. The problem is that using HTML5 features and control's own similar features together, without HTML5 detection and fallback, may cause clashes. For example, I found that HTML5 placeholder and Telerik RadTextBox EmptyMessage didn't play well together in Chrome 8. Consequently, I wouldn't use HTML5 features together with similar server control features until compatibility is assured.

Frankly, the more I think about it, the less it seems possible that HTML5 could ever replace the current nonstandard "advanced" input controls. Advanced controls are not just interchangeable standard parts, they have large and differentiated feature sets. For example, see jQuery UI Datepicker, DevExpress Date and Time Editors and Telerik RadDatePicker. Realistically, it's going to be an either-or choice between some custom control and a HTML5 standard control. And fallbacks are required only if you choose HTML5.

In the end, it is all about options. How valuable is it to use HTML5 during the lifetime of the application? Having the HTML5 option is not cheap, but giving it up means using the lame HTML4 controls or taking the possibly troublesome road of custom nonstandard dependencies. On the other hand, if your application will never see HTML5 capable browsers, or if the HTML5 features would be worse than the current ones, it makes no sense to use HTML5 at all. With these considerations in mind, I chose to use the Telerik input controls in my UI refactoring project. But if the application would not have been so dependent on ASP.NET standard controls, a solution using HTML5 and client-side fallbacks would have been appropriate too.

-mika-

Sunday, June 20, 2010

DevExpress ASPxPivotGrid Performance

In the previous post I introduced DevExpress PivotGrid, a web-based pivot table component. The tool itself is impressive, but I was somewhat suspicious about its performance in real-world reporting applications, and promised to look at this more. There are three potential performance bottlenecks in web-based pivot tables:
  • Querying data from the data source and getting the results back
  • Processing data in the application server
  • Loading and rendering the page in the browser
Query Performance
Raw query performance should be the same for every BI client. Fundamentally, the only thing a client can do to dramatically improve query performance is to cache the results. This is what in-memory engines like VertiPaq are all about. But caching leads to a tradeoff between timeliness and performance. You won't get real-time results from a cache. If your cache expires, say, once in an hour, the average staleness of your data is 30 minutes, which may or may not be too high given your business requirements.

Nevertheless, caching benefits most those source systems that are not well equipped for analytical workloads, like most databases behind line-of-business web applications. These are typically operational applications designed for many small concurrent reads and writes. They are not designed to handle analytical workloads that emphasize read performance. In analytical databases like relational data warehouses and OLAP databases, read performance is boosted by materialized views and heavy indexing, both of which are harmful to operational write performance. Typical analytical workload in a normalized, OLTP-friendly relational schema requires a lot of joins, run-time aggregations and multi-pass queries, which will be slow. On the other hand, read-optimized analytical databases do not usually contain real-time data to begin with, so there is no trade-off in using caching with them either. The message to take home is that if you intend to use PivotGrid for real-time operative reporting without using a cache, you are most exposed to the query bottleneck.

My experience is that query time from relational sources is easily the most dominating factor. Caching makes all the difference between a painful and enjoyable reporting system. Real-time reporting with slow queries is not fun to use, and users more often than not prefer fast stale data to slow real-time data. The hard problem here is to find an optimal cache invalidation policy. I won't go further, but the trade-offs involved will make you think hard about the usage patterns and requirement priorities.

Caching application data with ASP.NET means storing the PivotGrid's datasource object server-side in an application or session level store. Usually caching is done at the application level with System.Web.Caching.Cache, which has features like key dependencies to invalidate the cache. In the code snippet below, SqlDataSource1 retrieves data from cache as long as the CacheKeyDependency does not change.
Dim cacheKeyDependency = keyString
If Cache(SqlDataSource1.CacheKeyDependency) <> cacheKeyDependency Then
Cache(SqlDataSource1.CacheKeyDependency) = cacheKeyDependency
End If
PivotGrid.DataSource = SqlDataSource1

Application server performance
In web-based analytics, report processing is centralized. With ASP.NET this means that IIS application pool (w3wp.exe) will do all the processing. Once the query bottleneck is solved, it is likely that the application server becomes the next pain point. It is not possible to improve application server performance much with output caching since pivot tables are generated dynamically and HTML output is going to change all the time.

Server-Side Performance Tests
Eventually, it is the request execution time that matters, and you must empirically measure the query time and the application server's contribution. To separate the two, I ran tests with the cache on and off using a setup similar to Bruce Chapman's tests on swiching off the .aspx page extension on DotNetNuke friendly urls. My test loaded eleven different PivotGrids with similar HTML size. HTML size was restricted by having a maximum of ten rows and less than 20 columns in each loaded PivotGrid view.

The performance measures followed can be grouped to:
  • latency measures: Time to first byte (TTFB), Time to last byte (TTLB)
  • throughput measures: Requests per second
  • resource utilization measures: Processor time, Memory, IO
First, caching produces an obvious performance boost. As a matter of fact, I couldn't even get comparable latency and thougput measurements without cache since IIS didn't respond requests reliably then. It is best to visualize the difference with the processor time counter in performance monitor.

Figure 1. Processor time with cache


Figure 2. Processor time without cache


With cache and 100% hit rate, five runs were performed, and the results are shown below. The average TTFB was 1730 ms, and average TTLB only ten milliseconds higher. The small difference tells that there was practically no networking latency in my localhost test setup. Throughput was at 0,6 requests per second, which no one would call scalable. On the other hand, byte throughput was at 1,4 MBit per second, which means that the average response size was as large as 0,8MB.


These results are not generally replicable since the PivotGrid data in the test depends on timestamp values. Measurements taken on a same day should produce the same results though. But strictly, this is not the case. There is some variability even in subsequent runs that complete flawlessly. Median TTFB and TTLB vary considerably, since the number of requests is small (17-18), but average values are fairly consistent. And there are runs that won't complete perfectly even with cache on.

I won't claim to be a ASP.NET performance monitoring specialist, but I think the WCAT results are unambiguous. Overall, the results tell that my PivotGrids easily generate a heavy load on the server, and scalability problems are expectable. Under optimum conditions with 100% cache hit rate I could serve less than one page per second. In single user scenarios, response times are reasonably low though. Without cache, even a single user is too much for the server.

Client-side performance
PivotGrids are fairly static, in the sense that most of the client-side performance depends on loading data, not on javascript performance. And data sizes depend on what you ask for. That said, with sensible row count restrictions and default visible columns, HTML size should be few hundred KB at most.

PivotGrid reloads when you add new fields, and dropdownlists populate on demand. I have not studied the postback mechanism yet, but PivotGrid didn't work inside a UpdatePanel and DotNetNuke compression caused hangups too. Also, if a PivotGrid field is incorrectly mapped to a SQL column name (case-sensitive!), PivotGrid just hangs when the field is needed. Overall, not being able to use some methods of compression is perhaps the most serious client-side performance drawback, since you may have to turn off compression settings that would help other pages in your site.

Update: Setting EnableCallbackCompression to false on the ASPxPivotGrid and ASPxGridView controls solved the DNN compression and UpdatePanel incompatibilitity problem. You may also have to disable built-in callbacks by setting EnableCallbacks=false.


Conclusion
DevExpress PivotGrid is a high-level tool. You can't optimize it much. Instead, optimize queries, use caching, and be careful with the default parameter values and data sizes.

-mika-

Thursday, April 29, 2010

DevExpress ASPxPivotGrid

Lately I've been developing a reporting module for a web application used by our customers. This reporting module is still in very early stages, but it has the basic elements you would expect in a reporting tool: parameterized standard reports, ad-hoc reports, dashboards and charting. Of course, because it is part of our application and not an external service like SSRS, integration with other modules is very tight. For example, reporting can read other modules' settings and customizations and behave accordingly. This helps a lot in creating customer-specific reports without duplicating report code.

In this project I've been using technologies I've written about before: FusionCharts Free, jQuery, Telerik RadControls and Telerik Reporting. These technologies work fine in their respective domains. FusionCharts is great for dashboard visualizations, it has cool animated charts and drill-down support. jQuery is used everywhere in the UI, but not that much on the reporting itself. Telerik RadControls suite is used mostly in the navigation and report filtering controls. And there are a handful of Telerik Reporting reports used by some of our customers.

But the most interesting thing here is the ad-hoc reporting component: DevExpress ASPxPivotGrid.



ASPxPivotGrid is an ASP.NET component for creating crosstabs dynamically from relational or OLAP datasources. It is essentially what is known as a cube browser in the industry. But undoubtedly, a lot more people would call it a "pivot table" just because Excel has a similar functionality. Pretty soon, analysts will be introduced to the new PowerPivot tool. This means that the word "pivot" is destined to become, if not a household name, then at least a corporate buzzword. In my opinion, pivoting live data definitely has spreadsheet-level implications if done right. "Right" means multiple things here: data quality, general availability, high usabilitity, performance, developer support and enterprise-level security, to name a few.

DevExpress has done a lot of things right with PivotGrid. Web platform is even more generally available than Excel. Usability seems good enough, since our users pick up the necessary skills during demo meetings. Charting and drill-down are relatively easy to integrate. Performance is decent for a web control, but sure you can kill your server and browser with data. There is no built-in security, which is understandable for an ASP.NET control. This means that visibility of data is managed by the developer. Localization and theming are harder than they should, though. I'm used to the simplicity found in localizing and theming Telerik controls, even if they do not follow Microsoft's recommendations strictly.

But the most important thing is something I should stress more. After talks with our customers, it became clear that most business value was created by pivoting and integrated charting. No doubt, we showed them everything, but OLAP-style pivoting and instant integrated charts made a lasting impression.


From the developer's perspective, pivot tables are also the easiest tools in the long run, since you get customizations out of the box, which would require a lot of work in static reports.

There is lot more to say about web-based pivoting. In the next week or so, I'll talk more about the key issue of web-based ad hoc reporting: performance.

-mika-

Sunday, March 21, 2010

Full-service vs. Self-service Software

I just finished W.E. Pete Peterson's Almost Perfect, a highly recommended story of the rise and fall of WordPerfect Corporation.

One thing that struck me was the distinction between the old, full-service IBM model and the emerging off-the shelf PC software model:

As large corporations made the transition from the full-service old world to the self-service new world, they were desperately trying to find a way to extract the same level of service from their vendors, even though they were paying much lower prices.. They acted like spoiled rich people who were trying to save money on their vacation by going to a two-star resort instead of the five-star resort they were used to. In my mind I could see them sitting in their boats in the middle of a lake, yelling at the top of their lungs, "Where is my drink, where is my lunch, where is my magazine, and who is going to row my boat?" I imagined myself there as the manager of the resort politely yelling back to them, asking them to row their boat to the shore so I could explain where they needed to go to get what they wanted. Of course, they were not likely to pick up their oars, but I certainly was not going to swim out to their boat and row it for them. It would be foolish to give in to their demands and offer them five-star service at a two-star price.
http://www.wordplace.com/ap/ap_chap14.shtml

In today's world, the question of price point and service level is something that every SaaS vendor has to deal with. But the SaaS model is also more flexible in this respect. Customers who want five star service can buy it to supplement the self-service software. Some customers even want the end result as a service, and it is the SaaS vendor or a third party who actually uses the software to get this result.

I think this flexibility is what customers want. But the considerations raised by Peterson are present today just as they were before.
  • There must be price discrimination between full-service and self-service. It is unfair to provide higher level of service, for the same price, to those customers who complain the most.
  • There must be a clear distinction between services that do require custom software modifications and ones that don't. Also, trying to duct-tape custom modifications into self-service software generally leads to great unhappiness.
  • Responsibility over customer's systems and data must be clearly defined. In the SaaS world, vendors must take more responsibility over data loss due to software errors. But solving issues due to customer's errors or system configuration (browsers, security software etc.) should never be a free service.
-mika-